Understanding datasets
In this section, you will learn about the structure of datasets and what information they contain so that you can manage dataset resources in Fides. For the purpose of this Fides dataset guide we will use the Cookie House sample project available on Github (opens in a new tab).
Structure of Fides datasets
A Fides dataset is a YAML configuration file. For more information on YAML (yet another markup language), visit the official YAML.org (opens in a new tab) site.
A dataset contains the following information:
- Table schemas: the schema, or list of fields, for the database tables or collections of information represented.
- Data categories: which categories of personal data exist and within which fields.
- Identity keys: where to search for individual records when processing privacy requests.
Example: Fides dataset
To understand datasets in Fides we will use the example PostgreSQL dataset for the Cookie House sample project (opens in a new tab).
Dataset attributes
Below is a snippet of the attributes used to describe a dataset:
dataset:
- fides_key: postgres_example_test_dataset
organization_fides_key: default_organization
tags: null
name: Postgres Example Test Dataset
description: Example of a Postgres dataset containing a variety of related tables
like customers, products, addresses, etc.
As you can see from the example above, the general structure of a dataset in Fides is as follows:
Key | Value |
---|---|
dataset | Declares the type of resource to Fides, this can be an organization , system , dataset or policy . This guide focuses on dataset . |
fides_key | The unique name for your dataset. |
organization_fides_key | Datasets must have a relationship to an organization, this can be your company, department or some other group. |
tags | Optional tags to provide additional context or labeling and grouping of resources. |
name | The user-friendly label for the dataset presented in reports and Fides Control admin UI. |
description | A description of the dataset, useful for providing additional context in review/reporting. |
Dataset collections
Below is a snippet representing one collection and two fields in a dataset:
collections:
- name: customer
fields:
- name: address_id
data_categories:
- system.operations
fides_meta:
references:
- dataset: postgres_example_test_dataset
field: address.id
direction: to
- name: created
data_categories:
- system.operations
- name: email
data_categories:
- user.contact.email
fides_meta:
identity: email
data_type: string
- name: id
data_categories:
- user.unique_id
fides_meta:
primary_key: True
- name: name
data_categories:
- user.name
fides_meta:
data_type: string
length: 40
As you can see from the example above, the general structure of a collection in Fides is as follows:
Key | Value |
---|---|
collections | Declares a collection of data in a Fides dataset. |
collections → name | Used to represent the collection name, e.g. a table on a database, a document or an object of a resource such as an API. |
fields | The list of fields in the given collection. |
fields → name | The name of the field. |
data_categories | The category of personal data in the field. This can be an array of one or more values. |
fides_meta | Fides metadata attributes, used for defining entity relationships between collections and tables for privacy requests. |
data_type | Specify a required data type for type checking. |
length | Where a data type may require length, set the string length. |
references | Set references to establish relationships between other datasets and fields for data modeling and privacy requests. |
references → dataset | Reference to a dataset with a relationship to this field. |
references → field | Reference to the foreign collection and field in the format collection.field . |
references → direction | Defines the direction of the mapped relationship between fields for the purpose of executing privacy request between databases, tables and fields. |
Generating datasets
Now that you understand Fides datasets, you can learn how to generate a dataset using Fides Control administrative UI or Fides CLI: