Privacy request annotations

When Fides processes privacy requests, it needs to know how to traverse your datasets to process all data related to the user. This is done by creating references across datasets.

Dataset identity annotation

The dataset identity is the starting point for a privacy request. These identity keys are used to locate identifiers like email or phone numbers within a database.

The identity key must be assigned to a field that contains unique values that you would typically use to uniquely identify records in a database.

Expanding on our sample project, the most suitable identity would be the email field because it is unique and identifiable.

collections:
  - name: customer
    fields:
    - name: email
      fides_meta:           # Add Fides metadata section
        identity: email     # Specify this is the identity key and provide a name
        data_type: string   # Specify the expected data type

Connecting collections

The fundamental component for connecting collections is the references key. This key connects fields across datasets and controls the order of processing.

postgres_dataset.yml

references:
  - dataset: {{dataset_key}}
    field: {{referenced_field_name}}
    direction: from | to

Key	Value
`references`	Declares a reference in a field.
`dataset`	The `key` of the referenced dataset.
`field`	The pointer to a specific `field` in the referenced dataset.
`direction`	`from` indicates the referenced field must be processed before the current record. `to` indicates the current record must be processed before the referenced field.

💡

The field keyword is only used in references. fields is used for defining collections.

Examples

The example below demonstrates a reference between customer_id and id. The id field is in the customer collection of a dataset with the key postgres_example_test_dataset.

The direction from indicates that the id field must be processed before the customer_id field.

fields:
  - name: customer_id
    fides_meta:
      references:
        - dataset: postgres_example_test_dataset
          field: customer.id
          direction: from

The example below also demonstrates a reference between customer_id and id. The difference here is that the customer_id field sits in a semi-structured datasource, so it is nested within a parent object.

- fides_key: mongo_test
  name: Mongo Example Test Dataset
  collections:
    - name: customer_details
      fields:
        ...
        - name: comments
          fields:
            - name: customer_id
              fides_meta:
                references:
                  - dataset: postgres_example_test_dataset
                    field: customer.id
                    direction: from

Skipping collections

The ability to skip privacy request processing on specific data collections or API endpoints can be useful in scenarios such as a data processing error, or if the collection is known to not contain personal data.

In order to skip a collection, use the flag skip_processing as shown in this example:

dataset:
  - fides_key: postgres_example_dataset
    name: Postgres Example Dataset
    description: Example of a Postgres dataset containing a variety of related tables like customers, products, addresses, etc.
    collections:
      - name: address
        fides_meta:
          skip_processing: True

In order to skip an endpoint, use the flag skip_processing as shown in this example:

saas_config:
  fides_key: saas_connector_example
  name: SaaS Example Config
  type: custom
  description: A sample schema representing a SaaS for Fides
  version: 0.0.1
 
  endpoints:
    - name: skipped_collection
      skip_processing: True
      requests:
        read:
          method: GET
          path: /v1/misc_endpoint/<list_id>
          param_values:
            - name: list_id
              references:
                - dataset: saas_connector_example
                  field: users.list_ids
                  direction: from

To learn more about advanced configuration options and how Fides traverses databases, please see our guide for Query execution.

Data mapping annotations Dataset integrations