Datasets
In this section, you will learn how Datasets are used to describe datastores like databases and warehouses in order to process privacy requests.
What are datasets?
A Fides dataset is a YAML configuration file that can be used to represent a collection of data such as a database, warehouse or an API. A dataset describes a data model or schema by annotating each field with information such as the category of personal data (e.g. user ip address or system id). Datasets can be used for data inventory (aka data mapping) and to automate privacy requests for retrieval or erasure of specific data.
Generating, annotating and managing datasets
This guide walks through the steps for generating, annotating and managing datasets for privacy requests.
Understanding datasets
For details read about the Structure of datasets now.
Generating datasets
To generate a dataset, you use Fides Control administrative UI or Fides CLI:
- Generate a dataset in Fides Control administrative UI
- Generate a dataset using Fides
generate
CLI command
Annotating datasets
Annotating datasets is the process of describing, to Fides, where categories of personal data (e.g. user contact info) can be found and how fields in tables or collections are related so that Fides can traverse the data to fulfill privacy requests:
- Annotating data categories manually
- Annotating data categories with the CLI
- Defining the identity key for the dataset
- Define relationships between datasets, collections and fields
- Skipping collections in privacy requests
Uploading / Adding datasets to Fides
You can upload or add your datasets to Fides via the Control administrative UI or Fides CLI:
Linking datasets
To process privacy requests against databases a dataset must be linked to the database integration. Complete this final step with the guide for Linking datasets.