In this section, you will learn how Datasets are used to describe datastores like databases and warehouses in order to process privacy requests.
What are datasets?
A Fides dataset is a YAML configuration file that can be used to represent a collection of data such as a database, warehouse or an API. A dataset describes a data model or schema by annotating each field with information such as the category of personal data (e.g. user ip address or system id). Datasets can be used for data inventory (aka data mapping) and to automate privacy requests for retrieval or erasure of specific data.
This guide walks through the steps for generating, annotating and managing datasets for privacy requests.
For details read about the Structure of datasets now.
To generate a dataset, you use Fides Control administrative UI or Fides CLI:
- Generate a dataset in Fides Control administrative UI
- Generate a dataset using Fides
Annotating datasets is the process of describing, to Fides, where categories of personal data (e.g. user contact info) can be found and how fields in tables or collections are related so that Fides can traverse the data to fulfill privacy requests:
- Annotating data categories manually
- Annotating data categories with the CLI
- Defining the identity key for the dataset
- Define relationships between datasets, collections and fields
- Skipping collections in privacy requests
You can upload or add your datasets to Fides via the Control administrative UI or Fides CLI:
To process privacy requests against databases a dataset must be linked to the database integration. Complete this final step with the guide for Linking datasets.