Skip to content
Annotate complex fields

Annotate Complex Fields

Fides can retrieve and mask data from complex objects and arrays in MongoDB. This involves annotating your dataset files to let Fides know about your complex data.

Declare an object field

To declare an object field, define nested fields underneath that field. In the example below, workplace_info is an object field with two nested fields: employer and position.

Data categories cannot be specified at the object level due to potential conflicts with nested fields. Instead, annotate the scalar fields within the object field. Here, the workplace_info.position field has data_category: user.job_title.

dataset:
  - fides_key: mongo_nested_object_example
    name: Mongo Example with Nested Objects
    description: Example of a Mongo dataset that contains 'details' about customers defined in the 'postgres_example_test_dataset'
    collections:
      - name: customer_details
        fields:
          - ...
          - name: workplace_info
            fidesops_meta:
                data_type: object
            fields:
              - name: employer
                fidesops_meta:
                  data_type: string
              - name: position
                data_categories: [ user.job_title ]
                fidesops_meta:
                  data_type: string
              - name: id

Reference a nested field

To define a relationship between a field on one collection and a nested field on another collection, use dot notation in the fidesops_meta references for as many levels are necessary.

In the example below, this field is denoted by <collection_name>.<field_name>.<sub_field> name, or customer_details.workplace_info.id.

This relationship could also be defined on the customer_details.workplace_info.id field itself, with a direction of from, with field mydatabase.customer.workplace_id, and dataset mydatabase.

dataset:
  - fides_key: mydatabase
    name: internal database
    description: our internal database of customer data
    collections:
      - name: customer
        fields:
          - name: workplace_id
            data_categories: [system.operations]
            fidesops_meta:
              references:
                - dataset: mongo_nested_object_example
                  field: customer_details.workplace_info.id
                  direction: to
          ...
 

Declare an array field

There is no official array type. Instead, an array is represented by a [] flag on a field.

Declare an array of scalar values

In this example, the mydatabase:customer collection has a travel_identifiers field that is an string array, described by data_type: string[]. An array of integers would be described by data_type: integer[].

dataset:
  - fides_key: mydatabase
    name: internal database
    description: our internal database of customer data
    collections:
      - name: customer
        fields:
          - ...
          - name: travel_identifiers
            fidesops_meta:
              data_type: string[]
              data_categories: [system.operations]

Declare a nested array

In this example, the mydatabase:customer collection has a nested workplace_info.direct_reports string array.

We define direct_reports as a subfield under workplace_info, as well as add the data_type string[] to direct_reports.

dataset:
  - fides_key: mydatabase
    name: internal database
    description: our internal database of customer data
    collections:
      - name: customer
        fields:
          - name: workplace_info
            fidesops_meta:
              data_type: object
            fields:
              - name: employer
                fidesops_meta:
                  data_type: string
              - name: position
                data_categories: [ user.job_title ]
                fidesops_meta:
                  data_type: string
              - name: direct_reports
                data_categories: [ user.name ]
                fidesops_meta:
                  data_type: string[]

Declare an array of objects

In this example, the mydatabase:customer collection has an emergency_contacts object array field, or embedded documents, denoted by data_type: object[]. Each object in the emergency_contacts array can contain a name, relationship, and phone field.

dataset:
  - fides_key: mydatabase
    name: internal database
    description: our internal database of customer data
    collections:
      - name: customer
        fields:
          - name: emergency_contacts
            fidesops_meta:
              data_type: object[]
            fields:
              - name: name
                data_categories: [ user.name ]
                fidesops_meta:
                  data_type: string
              - name: relationship
                fidesops_meta:
                  data_type: string
              - name: phone
                data_categories: [ user.contact.phone_number ]
                fidesops_meta:
                  data_type: string

Reference an array

Reference an array field as if it is any other field. You cannot currently reference a specific index in an array field, but Fides will search an array field for matches.

In this example, mydatabase:flights.plane is an integer field that will be used to lookup records that match an integer in the mydatabase:aircraft.planes array.

dataset:
  - fides_key: mydatabase
    name: internal database
    description: our internal database of customer data
    collections:
     - name: flights
        fields:
          - ...
          - name: passenger_information
            fields:
              - name: passenger_ids
                fidesops_meta:
                  data_type: string[]
          - name: plane
            data_categories: [ system.operations ]
            fidesops_meta:
              data_type: integer
      - name: aircraft
        fields:
          - name: _id
            data_categories: [ system.operations ]
            fidesops_meta:
              primary_key: True
              data_type: object_id
          - name: planes
            data_categories: [ system.operations ]
            fidesops_meta:
              data_type: integer[]
              references:
                - dataset: mydatabase
                  field: flights.plane
                  direction: from
          - name: model
            data_categories: [ system.operations ]
            fidesops_meta:
              data_type: string

In this more complicated example, a field in an array of objects is used to look up a different field in an array of objects in another collection. Potentially multiple values from mydatabase:customer.comments.comment_id can be used to query for corresponding values in mydatabase:conversations.thread.comment. Because this field is in an array of objects, multiple matches may be found.

dataset:
  - fides_key: mydatabase
    name: internal database
    description: our internal database of customer data
    collections:
      - name: customer
        fields:
          - name: comments
            fidesops_meta:
              data_type: object[]
            fields:
              - name: comment_id
                fidesops_meta:
                  data_type: string
                  references:
                    - dataset: mydatabase
                      field: conversations.thread.comment
                      direction: to
      - name: conversations
        fidesops_meta:
          data_type: object[]
        fields:
          - name: thread
            fields:
              - name: comment
                fidesops_meta:
                  data_type: string
              - name: message
                fidesops_meta:
                  data_type: string
              - name: chat_name
                data_categories: [ user.name ]
                fidesops_meta:
                  data_type: string

Query an array

There are some assumptions made with array querying that may or may not fit with how your data is structured. If an array is an entrypoint into a collection (i.e., one collection references its array field), there is ambiguity around how the queries should be built. (e.g., AND versus OR, and whether only the matched indices or matched embedded documents within arrays should be considered).

Assumptions

  1. If an array is the entry point into a node, Fides will search for corresponding matches across the entire array. You cannot specify a certain index.
  2. Fides operates on "OR" queries. Data returned from multiple array fields will be flattened before being passed into the next collection.
    1. For example, Collection A returned values [1, 2, 3], and Collection B returned values [4, 5, 6]. Collection C has an array field that depends on both Collection A and Collection B. Fides will search Collection C's array field to return any record that contains one of the values [1, 2, 3, 4, 5, 6] in the array.
  3. By default, if an array field is an entry point to a node, only matching indices in that array are considered, both for access and erasures, as well as for subsequent queries on dependent collections where applicable.
    1. For example, a query on Collection A only matched indices 0 and 1 in an array. Only the data located at indices 0 and 1 will be returned, and used to query data on dependent collection C.
    2. This can be overridden by specifying return_all_elements: true on an entrypoint array field, in which case, the query will return the entire array and/or mask the entire array.
  4. Individual array elements are masked, not the entire array, e.g. ["MASKED", "MASKED", "MASKED"]

Example query traversal

This is an example traversal created from the test postgres_example and mongo_test datasets. Multiple collections point to or from complex objects and arrays. See the mongo_example_test_dataset.yml for more information.

Postgres and Mongo Query Traversal