Following our recent blog post on annotating Datasets and Systems in Fides, we take the next step in building Privacy-as-Code. Here, we walk through the process of codifying privacy policies for the purpose of being used in automated compliance checks. In doing so, your team identifies and roots out noncompliant code before it’s ever shipped.
Anatomy of a Policy
In Fides, a policy is a collection of rules. Each rule can be thought of intuitively as: “For Specific Condition X, Perform Specific Action Y.” In this blog post, we’ll use the following policy example. Suppose that we want to build a proactive check into the CI pipeline to confirm that all shipped code complies with this policy: Users’ contact information cannot be collected for the purpose of marketing.
In this blog post, we’ll build an example policy on marketing-related collection of contact information. Along the way, we’ll get familiar with the necessary components of a Fides policy. As we had discussed in the previous post, embedding these policy checks in CI offer substantial savings in time, money, labor, and risk when contrasted with a reactive approach.
Naming and Describing a Policy
In Fides, rules are codified within a YAML file by a handful of straightforward components. First, a
fides_key uniquely identifies the rule. In this case, we use
reject_direct_marketing as the value for
Next, we add a human-friendly name and description for the rule. We choose “Reject Direct Marketing” as the rule’s name. As for the description, we give a human-readable summary of the policy: “Disallow collecting any user contact info for marketing.”
From here, we describe the four privacy primitives, which you might recall from the annotation process:
We use terms from the Fides privacy taxonomy to add values for each primitive.
data_categories, we wish to describe the specific types of sensitive data. When we look back to the policy we aim to enforce in CI, the scope of the policy encompasses any contact information gathered from the user, so we add the following value:
data_uses, we give a formal label to the categories of data processing in the organization. The use case under consideration for this policy is advertising, so we add it accordingly:
data_subjects, we define the individual persons whose data the rule pertains to. In this policy, it is customers’ data that we are concerned with:
customer is the appropriate value.
data_qualifier, we indicate the acceptable or non-acceptable level of de-identification for this data. The data in question, user-provided contact information, directly identifies an individual, so we add the following value:
Using Fides, we have the power to further refine the semantics for policy enforcement. Inclusion criteria are basic logic gates on what kinds of data categories, use cases, subjects, and qualifiers should be considered when running automated privacy checks in CI. In particular, the inclusion criteria are:
When specifying values for each of the four privacy primitives, an inclusion criterion is included to indicate whether the given rule should be applied to code with
NONE of the values entered.
Our example policy only provides one value for each privacy primitive, so the distinction between
ALL might look trivial. However, let’s suppose for a moment that we wanted to create another rule that prevented the processing of any contact information or gender for marketing purposes. Then the choice between
ALL has real consequences for permissible code in the automated CI check. While choosing
ANY would catch instances of processing contact information and/or gender,
ALL would only catch instances in which both contact information and gender are processed.
We have now formalized, in detail, the kind of data that falls under the scope of this processing: user-provided contact information for the purposes of marketing. Next comes the action we want from the automated privacy review.
To begin, note that we have framed our policy negatively. That is, we have defined what we don’t want in our shipped code: collection of customers’ contact information for advertising purposes. So if our codebase demonstrates that undesired behavior, we should reject it, so we add
A Full-Fledged YAML Policy
Using our basic example, we have all of the pieces needed for our policy manifest.
Let’s look at one more policy. This one demonstrates multiple values for privacy primitives, so the choice of inclusion criteria—
ALL—is not a trivial one. Codifying this policy might look daunting, but it can be summarized in just two plain-language statements. First, the policy prohibits the usage of identifiable data for any purposes besides to provide the app’s basic functions.
Second, the policy prohibits any collection of sensitive data, for any purpose.
We’ll revisit this policy in the next blog post, where we will execute a policy evaluation in CI.
As with resource annotations in Fides, policies must be kept up-to-date with in-house privacy policies as well as relevant regulations that affect your company. By embedding Fides policy reviews into your team’s development processes, you maintain an accurate and powerful method of enforcing privacy compliance in the CI pipeline, before code ever handles PII out in the wild.
For the next and final installment in this three-part series, we dive into policy evaluation.
Learn More and Get Involved
Explore the rest of this three-part blog series to get acquainted with Fides:
To dive deeper into the Fides ecosystem and connect with the Fides open-source community, check out these resources: