In this article, we’ll use open-source privacy engineering tools to code a policy that prohibits applications from sharing data with third-parties. This was the data governance issue at stake in a 2019 ruling by the FTC against Facebook that resulted in a hefty fine.
Our team at Ethyca recently launched Fides, the first open-source developer tools to make trust, respect, and privacy part of any tech stack. The reception has been fantastic, and we’re actively helping some of the world’s best privacy engineers and engineering teams to roll out Fides across their tech stack. (I can’t wait to share more about our design partners in the near future!)
Enthusiasm for the idea of open-source Privacy-as-Code is unmistakable, nevertheless, translating that enthusiasm into practical application is a question we sometimes find ourselves fielding.
“Yes, it sounds intriguing, but what sort of value could something like Fides be delivering right here, right now, in our business?”
Today, as the first part of a larger article series, I want to illustrate how a few lines of fideslang can enforce an important set of data guardrails across a large, distributed system. We’re going to be writing a Fides policy that prohibits an application from sharing data with third-parties for purposes other than those specifically agreed to by the user or stated by the organization.
How can this add value for a business? Well, the absence of this exact set of guardrails we’ll be coding today was the catalyst for a $5 billion FTC fine in 2019, levied against Facebook for continued collection of user data by third-party app developers without user consent.
It’s safe to say that this is a situation where privacy engineers using Fides open-source tools could have delivered remarkable value for one of the biggest companies in the world.
Let’s see how…
A quick sidebar to present the context for this piece. In a four-part series, I’m going to demonstrate the power of a Privacy-as-Code approach is by answering the following questions:
To answer these questions, I’ll examine some of the world’s most talked-about privacy cases of recent times. I’ll distill them into a summation of what went wrong at a technical level and show, by writing real policies in Fides, how these failures could have been prevented with minimal friction for every engineer.
One necessary disclaimer as we charge into writing some fideslang policies is to observe that, of course, technology does not operate in a vacuum. As part of their investigation, the FTC did identify organizational process failures throughout Facebook’s handling of privacy program management. That is to say, culture is important, and Facebook has some macro issues related to governance that are beyond the scope of this post. Our focus here is to illustrate technical measures that could have been taken to ensure technology systems behaved in accordance with the organization’s stated policy at the time.
With that in mind, I’ve summarized a relevant section of the findings from the FTC with some color related to what went wrong:
Taken directly from the FTC:
“Facebook announced in April 2014 that it would stop allowing third-party developers to collect data about the friends of app users (‘affected friend data’). Despite this promise, the company separately told developers that they could collect this data until April 2015 if they already had an existing app on the platform. The FTC alleges that Facebook waited until at least June 2018 to stop sharing user information with third-party apps used by their Facebook friends.”
From the FTC’s expansive investigation, I’d highlight three core data governance issues:
Let’s talk about what should be technically enforceable.
If we, as privacy engineers, make a promise to our users about how we use their data, irrespective of the scale of our infrastructure, we want to be able to keep that promise. Of course, the reality is that systems get built rapidly and incrementally in response to user demand and new business requirements, so if you go from one transactional db to petabyte scale distributed data infrastructure, you have data duplicated to multiple locations, often running asynchronously with separate enforcement tools. Existing tech systems don’t have the tools needed to respect users’ data at scale.
At its simplest level, Facebook was unable to be trusted with this promise because they didn’t have context over all the data flowing across their systems (the categories of data) or what it was being used for (the category of use) and its associated limitations. By limitations here, I mean purposes or uses for which the data was not approved.
Fides is built to solve for problems like this. In its current release, you can already draft a policy in YAML using fideslang and enforce that policy to ensure engineers across a team can’t accidentally or intentionally misuse data in a way that deviates from the promises a business or application makes to its users.
(A sidebar; today Fides supports these enforcements in your CI pipeline for your own engineering teams. In the near future, Fides will extend to provide the same enforcement in your runtime environment as queries or executed as well as against external APIs. This will ensure you can make a promise to your users and trust that it can be kept across distributed systems, both owned and third-party.)
The management application for Fides, fidesctl (Fides Control), is comprised of:
In its simplest form, Fides is a language to describe the context of how your codebase is handling various categories of data and what purpose it’s using them for. As you can see from the diagram below, the fidesctl server is used to create and store policies that govern what is permitted by your team, organization, or a given regulation. These policies are automatically checked on commit to provide active control and ensure the work you’re doing meets the criteria of the policy and if not, provides helpful notes to allow you to make changes before re-committing. In short: avoiding the risk of deploying code that might not comply with the promises you’ve made to your users.
If you’d like to learn more about Fides, checkout the repos and documentation at https://fid.es/ctl.
As you can see in this example, a policy can be extremely detailed and fine-grained, describing specific categories of data or purposes of use. Alternatively it can be wide-sweeping to limit the use of many data types more broadly.
Let’s walk through the policy:
fides_key is the key of the organization to which this policy belongs.
name: Data Sharing Policy
name is the human-readable label assigned to the policy.
description is the human-readable description that provides more context on the purpose of the policy.
Policies may contain multiple rules grouped within the rules:sub-group
data_categories Is the attribute of data governed by the policy as defined in the Fides taxonomy.
data_uses is the attribute that describes the various categories of data processing or purposes for which data may be used in your company.
data_subject describes the individual person types to which the data belongs, such as customer, employee, patient, etc.
data_qualifier describes the acceptable or non-acceptable level of de-identification permitted.
matches is an enumerated list of criteria that describes how you would like the rule to be evaluated. These basic logic gates determine whether the array of privacy attributes will be fully included (ALL), not included at all (NONE), only included if at least 1 item in the array matches (ANY), or excluded with any additional attributes included (OTHER).
As Fides is intended to be lightweight and human-readable, it quickly becomes clear what this policy’s intended outcome is. However, let’s walk through what it’s doing:
In essence the rule is a conditional statement that can be read as:
“Where any of n categories of data are found to be in use for any of n purposes of use for customer data, reject or block this activity.”
As you can see, we’ve provided some context around the policy’s purpose by providing it with a name and description. From there, we’ve created just one rejection rule which is intended to disallow (or reject) the use of any account or user identifiable data (whether that is provided or derived) for any purpose related to third_party_sharing. Third-party sharing in the Fides taxonomy represents sharing of data to third-party (external) destinations related to marketing or advertising.
This policy can be loaded into fidesctl server and will prevent engineers from merging and deploying code which shares account or user identifiable data of any kind with third parties.
Let’s quickly look at that close up with the diagram below:
I’ve previously written on the benefits of a Privacy-as-Code approach, and these are now becoming a reality with Fides.
Fides can be integrated with the existing automated CI pipeline checks already in place to ensure that no engineer can accidentally or intentionally bypass these controls. As a result, you can evaluate every commit or PR and ensure that engineers are simply declaring the privacy or governance characteristics of their code and have that checked into git.
This CI integration results in multiple benefits that would directly mitigate some of Facebook’s challenges and the demands the FTC places on Facebook within the consent decree, specifically:
The benefit here is plain to see. A standard, interoperable language to describe governance policies and a set of tools to ensure these can be enforced and observed throughout both software development and production environments all sum up to this vital capability: a business can trust that the promises its systems make are kept.
If you consider your own data infrastructure and its related data footprint, how confident are you of the types of data you’re handling, what you’re using it for and what systems they’re flowing into? Most teams feel like they have an abstract mental model for this, but when you examine the details, after a few months or years of creep, this accounting is rarely maintained, and so enforcing a user’s personal rights is complex or impossible. Ask yourself: Do you know all of the systems into which your users’ data flows and what it’s being used for? If you don’t, who does?
The irony is we obsess over either atomicity or eventual consistency depending on our database type, however in parallel, we’ve essentially given up on the idea that we can achieve any level of assured and enforceable consistency for an individual user — it seems like we’ve missed out on engineering one of the most important components of data infrastructure, given how complex most systems’ data flows are.
An open standard like Fides can directly answer the healthy demand the FTC is placing on engineering teams at Facebook for data context and control, while also preventing the major issues that got them here in the first place. If you’re building something new or continuing to iterate on your existing systems, adding Fides to your tech stack will reduce complexity, accelerate your development pipeline, all while ensuring your application can be better trusted by users.
In the next installment we’ll showcase additional capabilities of Fides when applied to another one of the biggest privacy cases from the past decade.
Thanks for reading.
We enjoyed two great days of security and privacy talks at this year’s Symposium on Usable Privacy and Security, aka SOUPS Conference! Presenters from all over the world spoke both in-person and virtually on the latest findings in privacy and security research.
At Ethyca, we believe that software engineers are becoming major privacy stakeholders, but do they feel the same way? To answer this question, we went out and asked 337 software engineers what they think about the state of contemporary privacy… and how they would improve it.
The UK’s new Data Reform Bill is set to ease data privacy compliance burdens on businesses to enable convenience and spark innovation in the country. We explain why convenience should not be the end result of a country’s privacy legislation.
Our team at Ethyca attended the PEPR 2022 Conference in Santa Monica live and virtually between June 23rd and 24th. We compiled three main takeaways after listening to so many great presentations about the current state of privacy engineering, and how the field will change in the future.
For privacy engineers to build privacy directly into the codebase, they need agreed-upon definitions for translating policy into code. Ethyca CEO Cillian unveils an open source system to standardize definitions for personal data living in the tech stack.
Masking data is an essential part of modern privacy engineering. We highlight a handful of masking strategies made possible with the Fides open-source platform, and we explain the difference between key terms: pseudonymization and anonymization.
Our team of data privacy devotees would love to show you how Ethyca helps engineers deploy CCPA, GDPR, and LGPD privacy compliance deep into business systems. Let’s chat!Book a Demo