Fides enables developers to check for privacy compliance directly in the CI pipeline, proactively addressing risk and compliance according to resource annotations and Fides policies.
This blog post is the third in a three-part series on getting started with Fides Control, the open-source tool for Privacy-as-Code that runs privacy checks directly in the CI pipeline. To recap, we have covered the following topics:
Throughout this blog series, we have used a simple example of a Flask web application for an e-commerce company. We’ll return to this example once again. We will also draw on the Policy resources created in the previous blog post. For a hands-on walkthrough, clone the demo repo and check out our tutorial.
The first two posts in this series have laid the groundwork for policy evaluation: powerful in-CI privacy checks. It’s useful in itself to understand the aspects of PII processed by this app, and to have a firm grasp on the policies that the app must abide by. However, combining these understandings through an automated privacy check is more than useful knowledge; it’s vital to embedding comprehensive privacy into our CI/CD workflow. Let’s explore how our annotations and policies reap great dividends in the policy evaluation stage.
Suppose that we wish to add Google Analytics to our app, to better understand how users interact with the app. Check out the hands-on tutorial to see how to add the Google Analytics script in the Fides demo. Here, we’ll focus on the impact of adding Google Analytics on our codebase’s privacy compliance.
We will create a YAML file for this new System resource, for which we give the fides_key
indicated as google_analytics_system
. Using what we learned in the first blog post of this series, we populate the file as shown here:
We have added several comments in the file, and we’ve encountered all of the other components in the earlier blog posts. In plain terms, this System annotation tells us the following: First, Google Analytics processes users’ browsing history, cookie IDs, telemetry data, and location data—all of which are identifiable on their own but pseudonymized here—alongside non-identifiable data for the purpose of improving the app. Second, Google Analytics processes the user’s derived IP address for improving the app. Note that this data is not pseudonymized by default.
Suppose that our directory of Fides resources includes the second policy from the previous blog post. Recall that all Fides policies are comprised of rules, and this policy’s rules were the following:
We will run our automated privacy check of the System corresponding to the Google Analytics system against this policy. Going to the command line, we execute make fidesctl-evaluate
to evaluate this System.
This Google Analytics implementation fails the privacy check! It’s time to dive back into the implementation to correct the noncompliant code.
In the command line, the executed evaluate command tells us which of the privacy declarations in the System YAML file failed the privacy check. It also points out which rule the noncompliant declaration violated.
In our example, the derivation of users’ geographic location violates the first rule in our policy, which prohibits the usage of identifiable user information for purposes besides basic app function. Let’s see where things went awry.
In our annotation of the System resource for Google Analytics, we see that Google Analytics would be processing users’ identifiable information—namely, their devices’ IP addresses and their location. Crucially, as we noted earlier, such data is classified as identifiable for the data_qualifier attribute. We need to update the data qualifier so that it is not identifiable but rather pseudonymized.
Of course, it’s not enough to change the annotations alone. We need the technical systems to actually behave that way! Visit the tutorial for a brief walkthrough of the process to pseudonymize IP addresses in Google Analytics. Once IP addresses are pseudonymized, we return to the System annotation and update the data_qualifier attribute for the relevant privacy declaration. We replace the old value
aggregated.anonymized.unlinked_pseudonymized.
pseudonymized.identified
with one that reflects the now-pseudonymized nature of that location data:
aggregated.anonymized.unlinked_pseudonymized.pseudonymized
Now, when we execute the evaluate command, it passes! With this simple example, we have described our systems and our policies to ultimately run privacy compliance checks directly within the CI pipeline.
Privacy belongs in the software development life cycle. At Ethyca, we consistently advocate for this approach, and it’s more than just a catchphrase. Looking back at our example e-commerce application, suppose that we did not check our code prior to deployment for privacy compliance. We would have shipped the code, and it would only be once actual people’s PII was flowing through the app that we would be scrambling to correct the privacy violation. The costs are manifold, burdening engineering teams in time and labor as well as the entire company in terms of reputation and risk of costly privacy fines.
The example that we have illustrated in this blog series is a vast simplification of most tech stacks today, where there are many more Datasets and Systems to annotate, and a complex web of policies to codify in Fides. However, the straightforward processes demonstrated in this series scale to cover real-world tech stacks. Fides equips devs with not only a clear, standardized taxonomy of privacy terms to annotate resources, but also a powerful evaluation function to pinpoint noncompliant code. The result is low-friction privacy checks built into teams’ routine CI/CD workflow, meaning companies and end-users can trust the tech that handles personal information.
To dive deeper into the Fides ecosystem and connect with the Fides open-source community, check out these resources:
Today we’re announcing faster and more powerful Data Privacy and AI Governance support
See new feature releases enhancing user experience, adding new integrations and support for IAB GPP
Learn more about the privacy and data governance enhancements in Fides 2.27 here.
Read Ethyca’s CEO Cillian Kieran describe why and how an open data governance ontology enables companies to comply with data privacy regulations and frameworks.
Ethyca sponsored the Unpacking Privacy Engineering for Lawyers webinar for the Interactive Advertising Bureau (IAB) on December 14, 2023. Our CEO Cillian Kieran moderated the event and ran a practical discussion about how lawyers and engineers can work together to solve the technical challenges of privacy compliance. Read a summary of the webinar here.
Ethyca’s CEO Cillian Kieran hosted a LinkedIn Live about the newly agreed upon EU AI Act. Read a summary of his talk and find a link to his slides on what governance, data, and engineering teams need to do to comply with the AI Act’s technical risk assessment and data governance requirements.
Our team of data privacy devotees would love to show you how Ethyca helps engineers deploy CCPA, GDPR, and LGPD privacy compliance deep into business systems. Let’s chat!
Request a Demo