Skip to content

Datasets

In this section, you will learn how Datasets are used to describe datastores like databases and warehouses in order to process privacy requests.

What are datasets?

A Fides dataset is a YAML configuration file that can be used to represent a collection of data such as a database, warehouse or an API. A dataset describes a data model or schema by annotating each field with information such as the category of personal data (e.g. user ip address or system id). Datasets can be used for data inventory (aka data mapping) and to automate privacy requests for retrieval or erasure of specific data.

You can think of Datasets as a map for any collection of data in Fides. For this reason, you'll see Datasets for database as well as third-party SaaS applications.

Generating, annotating and managing datasets

This guide walks through the steps for generating, annotating and managing datasets for privacy requests.

Understanding datasets

For details read about the Structure of datasets now.

Generating datasets

To generate a dataset, you use Fides Control administrative UI or Fides CLI:

Annotating datasets

Annotating datasets is the process of describing, to Fides, where categories of personal data (e.g. user contact info) can be found and how fields in tables or collections are related so that Fides can traverse the data to fulfill privacy requests:

  1. Annotating data categories manually
  2. Annotating data categories with the CLI
  3. Defining the identity key for the dataset
  4. Define relationships between datasets, collections and fields
  5. Skipping collections in privacy requests

Uploading / Adding datasets to Fides

You can upload or add your datasets to Fides via the Control administrative UI or Fides CLI:

Linking datasets

To process privacy requests against databases a dataset must be linked to the database integration. Complete this final step with the guide for Linking datasets.