Privacy-as-Code is a means of codifying privacy policies in the codebase. Using an example policy on data collection, here’s how to start creating policies in Fides.
Following our recent blog post on annotating Datasets and Systems in Fides, we take the next step in building Privacy-as-Code. Here, we walk through the process of codifying privacy policies for the purpose of being used in automated compliance checks. In doing so, your team identifies and roots out noncompliant code before it’s ever shipped.
In Fides, a policy is a collection of rules. Each rule can be thought of intuitively as: “For Specific Condition X, Perform Specific Action Y.” In this blog post, we’ll use the following policy example. Suppose that we want to build a proactive check into the CI pipeline to confirm that all shipped code complies with this policy: Users’ contact information cannot be collected for the purpose of marketing.
In this blog post, we’ll build an example policy on marketing-related collection of contact information. Along the way, we’ll get familiar with the necessary components of a Fides policy. As we had discussed in the previous post, embedding these policy checks in CI offer substantial savings in time, money, labor, and risk when contrasted with a reactive approach.
In Fides, rules are codified within a YAML file by a handful of straightforward components. First, a fides_key
uniquely identifies the rule. In this case, we use reject_direct_marketing
as the value for fides_key.
Next, we add a human-friendly name and description for the rule. We choose “Reject Direct Marketing” as the rule’s name. As for the description, we give a human-readable summary of the policy: “Disallow collecting any user contact info for marketing.”
From here, we describe the four privacy primitives, which you might recall from the annotation process:
data_categories
data_uses
data_subjects
data_qualifier
We use terms from the Fides privacy taxonomy to add values for each primitive.
For data_categories
, we wish to describe the specific types of sensitive data. When we look back to the policy we aim to enforce in CI, the scope of the policy encompasses any contact information gathered from the user, so we add the following value: user.provided.identifiable.contact
.
For data_uses
, we give a formal label to the categories of data processing in the organization. The use case under consideration for this policy is advertising, so we add it accordingly: advertising
.
For data_subjects
, we define the individual persons whose data the rule pertains to. In this policy, it is customers’ data that we are concerned with: customer
is the appropriate value.
And for data_qualifier
, we indicate the acceptable or non-acceptable level of de-identification for this data. The data in question, user-provided contact information, directly identifies an individual, so we add the following value: aggregated.anonymized.unlinked_pseudonymized.
pseudonymized.identified
Using Fides, we have the power to further refine the semantics for policy enforcement. Inclusion criteria are basic logic gates on what kinds of data categories, use cases, subjects, and qualifiers should be considered when running automated privacy checks in CI. In particular, the inclusion criteria are:
ANY
ALL
NONE
When specifying values for each of the four privacy primitives, an inclusion criterion is included to indicate whether the given rule should be applied to code with ANY
, ALL
, or NONE
of the values entered.
Our example policy only provides one value for each privacy primitive, so the distinction between ANY
and ALL
might look trivial. However, let’s suppose for a moment that we wanted to create another rule that prevented the processing of any contact information or gender for marketing purposes. Then the choice between ANY
and ALL
has real consequences for permissible code in the automated CI check. While choosing ANY
would catch instances of processing contact information and/or gender, ALL
would only catch instances in which both contact information and gender are processed.
We have now formalized, in detail, the kind of data that falls under the scope of this processing: user-provided contact information for the purposes of marketing. Next comes the action we want from the automated privacy review.
To begin, note that we have framed our policy negatively. That is, we have defined what we don’t want in our shipped code: collection of customers’ contact information for advertising purposes. So if our codebase demonstrates that undesired behavior, we should reject it, so we add REJECT
.
Using our basic example, we have all of the pieces needed for our policy manifest.
Let’s look at one more policy. This one demonstrates multiple values for privacy primitives, so the choice of inclusion criteria—ANY
versus ALL
—is not a trivial one. Codifying this policy might look daunting, but it can be summarized in just two plain-language statements. First, the policy prohibits the usage of identifiable data for any purposes besides to provide the app’s basic functions.
Second, the policy prohibits any collection of sensitive data, for any purpose.
We’ll revisit this policy in the next blog post, where we will execute a policy evaluation in CI.
As with resource annotations in Fides, policies must be kept up-to-date with in-house privacy policies as well as relevant regulations that affect your company. By embedding Fides policy reviews into your team’s development processes, you maintain an accurate and powerful method of enforcing privacy compliance in the CI pipeline, before code ever handles PII out in the wild.
For the next and final installment in this three-part series, we dive into policy evaluation.
Explore the rest of this three-part blog series to get acquainted with Fides:
To dive deeper into the Fides ecosystem and connect with the Fides open-source community, check out these resources:
Today we’re announcing faster and more powerful Data Privacy and AI Governance support
See new feature releases enhancing user experience, adding new integrations and support for IAB GPP
Learn more about the privacy and data governance enhancements in Fides 2.27 here.
Read Ethyca’s CEO Cillian Kieran describe why and how an open data governance ontology enables companies to comply with data privacy regulations and frameworks.
Ethyca sponsored the Unpacking Privacy Engineering for Lawyers webinar for the Interactive Advertising Bureau (IAB) on December 14, 2023. Our CEO Cillian Kieran moderated the event and ran a practical discussion about how lawyers and engineers can work together to solve the technical challenges of privacy compliance. Read a summary of the webinar here.
Ethyca’s CEO Cillian Kieran hosted a LinkedIn Live about the newly agreed upon EU AI Act. Read a summary of his talk and find a link to his slides on what governance, data, and engineering teams need to do to comply with the AI Act’s technical risk assessment and data governance requirements.
Our team of data privacy devotees would love to show you how Ethyca helps engineers deploy CCPA, GDPR, and LGPD privacy compliance deep into business systems. Let’s chat!
Request a Demo