In this section, you will learn about the structure of datasets and what information they contain so that you can manage dataset resources in Fides. For the purpose of this Fides dataset guide we will use the Cookie House sample project available on Github (opens in a new tab).
A Fides dataset is a YAML configuration file. For more information on YAML (yet another markup language), visit the official YAML.org (opens in a new tab) site.
A dataset contains the following information:
- Table schemas: the schema, or list of fields, for the database tables or collections of information represented.
- Data categories: which categories of personal data exist and within which fields.
- Identity keys: where to search for individual records when processing privacy requests.
To understand datasets in Fides we will use the example PostgreSQL dataset for the Cookie House sample project (opens in a new tab).
Below is a snippet of the attributes used to describe a dataset:
- fides_key: postgres_example_test_dataset
name: Postgres Example Test Dataset
description: Example of a Postgres dataset containing a variety of related tables
like customers, products, addresses, etc.
As you can see from the example above, the general structure of a dataset in Fides is as follows:
|Declares the type of resource to Fides, this can be an
policy. This guide focuses on
|The unique name for your dataset.
|Datasets must have a relationship to an organization, this can be your company, department or some other group.
|Optional tags to provide additional context or labeling and grouping of resources.
|The user-friendly label for the dataset presented in reports and Fides Control admin UI.
|A description of the dataset, useful for providing additional context in review/reporting.
Below is a snippet representing one collection and two fields in a dataset:
- name: customer
- name: address_id
- dataset: postgres_example_test_dataset
- name: created
- name: email
- name: id
- name: name
As you can see from the example above, the general structure of a collection in Fides is as follows:
|Declares a collection of data in a Fides dataset.
|Used to represent the collection name, e.g. a table on a database, a document or an object of a resource such as an API.
|The list of fields in the given collection.
|The name of the field.
|The category of personal data in the field. This can be an array of one or more values.
|Fides metadata attributes, used for defining entity relationships between collections and tables for privacy requests.
|Specify a required data type for type checking.
|Where a data type may require length, set the string length.
|Set references to establish relationships between other datasets and fields for data modeling and privacy requests.
|Reference to a dataset with a relationship to this field.
|Reference to the foreign collection and field in the format
|Defines the direction of the mapped relationship between fields for the purpose of executing privacy request between databases, tables and fields.
Now that you understand Fides datasets, you can learn how to generate a dataset using Fides Control administrative UI or Fides CLI: