To unlock the Privacy-as-Code power of the Fides developer tools, it all starts with understanding the basics of annotations: descriptions of privacy behaviors in the tech stack.
In this three-part blog series, we get to the why and the how of building Privacy-as-Code. That is, we’ll explore the process by which you can describe privacy behaviors in your code, and we’ll highlight the underlying reasons for making these proactive design choices.
In this blog, we’ll discuss annotation, the means by which you can describe the privacy behaviors in your codebase, such as:
Before jumping into the annotation process, it’s important to grasp the multiple reasons for annotating with
fidesctl in the first place.
By annotating your Dataset and System resources with their privacy behaviors, you enable automated checks against privacy policies within CI pipelines. This means that you and your team will be notified of policy violations and privacy risks before code is ever deployed.
For instance, the application might need to abide by an in-house policy or regulation that prohibits the use of users’ identifiable personal data for advertising purposes. Suppose you notice code that fails to comply with this rule only after it’s been deployed and now processing users’ data—how much simpler, less expensive, and less risky if you had addressed this issue back in development! That’s where annotations come in. They enable you to build and verify privacy from the start, a design choice that will persist into runtime.
Respect isn’t just an ideal for a system to reflect for its end-users. It’s also a business choice with distinct benefits. Users want to do business with trustworthy companies. From the EU’s GDPR to California’s CCPA, privacy regulations are imposing strict requirements on technical systems, with fines and reputational damage on the line. An estimated 65% of the world’s population will be covered by a general privacy law by 2023.
Data annotation using the open-source Fides toolset is an important first step towards deriving the benefits of true Privacy-by-Design in your infrastructure. Let’s get annotating!
Throughout this series, we’ll use a simple example to illustrate the function and power of Privacy-as-Code. Suppose you have an e-commerce application with the following functions:
With this app, your application infrastructure includes:
fidesctlto declare privacy manifest and to conduct policy evaluations
The schema has the following structure:
Fides organizes data infrastructure into a hierarchy of resources. At the root is the Organization, which contains all parts of a company or enterprise and the policies the Organization’s sub-resources must follow. While a company could have multiple Organizations, perhaps to categorize their data processing into EU and non-EU countries, the Organizations do not share their sub-resources. So policies in one Organization cannot refer to policies in another Organization.
> Registry (optional)
One level below the root, there is an optional level called Registry, which is a collection of Systems. We’ll discuss Systems in detail shortly. They describe data processing activities and the role of Datasets in those activities. A Dataset combines a database schema with Fides privacy categorizations. Each Dataset is represented as a set of Collections (tables), which themselves contain Fields (columns). Datasets support granular annotations of privacy behaviors: what data you are storing and where that data is stored.
In Fides, Datasets abstract away any database-specific details, so they are agnostic to whether the e-commerce app uses Postgres, MongoDB, or other databases. This way, Datasets can contain privacy categorizations for a variety of databases, feeding privacy metadata into other tooling. We’ll get to the System resource shortly. First, let’s annotate a Dataset.
With our example e-commerce app, we have our PostgreSQL database. In order to make annotations, we must create a Dataset resource. We add a YAML file to our directory of Fides resources to describe this Dataset and its internal structure. Fides has a feature to automatically create a YAML file for you to annotate, according to the database schema (more on this in the support documentation)
We are then guided by a basic question: “In this dataset, what data categories are stored?” From there we ask, “How is the data protected?” For every Field (column), use the Fides language (
fideslang) taxonomy to describe the data categories and data qualifiers.
For instance, a password has the data category
and the data qualifier
We are using Fides syntax to categorize data within an organized hierarchy.
Continue with this process for your other databases: creating Dataset resources and providing the appropriate attributes to give a robust layer of privacy metadata throughout your data infrastructure.
A System describes anything that processes data. This could be third-party APIs, services, etc. In a System resource, we can describe a rich layer of privacy metadata. Like Datasets, we describe data categories and data qualifiers. Furthermore, we describe use cases and data subjects according to the Fides taxonomy:
Returning to our example e-commerce app, we have a single Flaskr Web Application system, and that calls for a System resource. We add a YAML file to the directory of Fides resources with a privacy declaration that describes all four of the above primitives.
As your business activities evolve, so will your data infrastructure and your use cases for processing personal data. Incorporate annotation updates, at both the Dataset and System level, into your team’s routine development processes. In doing so, you ensure that the automated privacy checks run in the CI pipeline reflect the most up-to-date data infrastructure. As maintainers of Fides, increased tooling for increasing efficiency in the annotation phase is an important part of Ethyca’s roadmap.
With your Datasets and Systems annotated, you can then start to formalize policies that can be enforced in CI. The next blog post explores the policy creation process.
Explore the rest of this three-part blog series to get acquainted with Fides:
To dive deeper into the Fides ecosystem and connect with the Fides open-source community, check out these resources:
Ethyca’s VP of Engineering Neville Samuell recently spoke at the University of Texas at Austin’s Texas McCombs School of Business about privacy engineering and its role in today’s digital landscape. Read a summary of the discussion by Neville himself here.
Learn more about all of the updates in the Fides 2.24 release here.
Ethyca’s Senior Software Engineer Adam Sachs goes through the thought process of creating Fideslang, the privacy engineering taxonomy that standardizes privacy compliance in software development.
Learn more about all of the updates in the Fides 2.23 release here.
Our Senior Software Engineer Dawn Pattison walks you through implementing data minimization into your business.
Learn more about all of the updates in the Fides 2.22 release here.
Our team of data privacy devotees would love to show you how Ethyca helps engineers deploy CCPA, GDPR, and LGPD privacy compliance deep into business systems. Let’s chat!Request a Demo