Once you’ve decided on the appropriate method for implementing your data map, you’ll need to analyze your company’s data using one of a few different methods. Here’s a guide to how this Data Notation process should be carried out.
If you’re not directly responsible for the technical implementation of your company’s data map, the following article will help you understand some key data map-related terms and processes. We start with some important definitions. These can be used to communicate more clearly with your technical resource on how best to approach the task of building a data map for your business. The rest of this tutorial will be useful for understanding the work that goes into manually mapping the data your company holds.
|Data Model||An organized, visual representation of the data that are stored in a database and how that data relates to each other.|
|Abstract Data Model||A simplified representation of a data model.|
|Data Object||A region of storage that contains a value or group of values. Each object has a defined data type (e.g., an integer, decimal number, character, or string of characters that make a word or sentence).|
|Data Attribute||The characteristics of a data object or features of a data set. For an order dataset consisting of an ORDER ID, CUSTOMER NAME, and CUSTOMER EMAIL, CUSTOMER NAME and CUSTOMER EMAIL would be the data attributes for that order ID.|
|Data Entity||Representation in a data model of a physical or conceptual object from the real world, such as a CUSTOMER. Entities don’t represent any data themselves but are containers for attributes and relationships between objects.|
|Data Element||Any unit of data defined for processing is a data element (e.g., ACCOUNT NUMBER, NAME, ADDRESS and CITY).|
|Data Schema||A physical implementation of a data model in a specific database management system. It includes all implementation details such as data types, constraints, and foreign or primary keys.|
|Data Dictionary||A set of information describing the contents, format, and structure of a database and the relationship between its elements, used to control access to and manipulation of the database. It is a reference and description of each data element.|
|Domain Specific||Dedicated to or focused on a particular problem domain.|
If you’re the individual responsible for actually implementing the data map, the walkthrough below should get you started on the right track. The key process we examine in this piece is called Data Notation. Put simply, it’s a process that blends computer and manual methods to annotate (and then verify) entity relationships for entities that store personal data in your business. It’s the vital technical process for building a CCPA or GDPR-compliant data map.
A data map (also known as a data flow map, personally identifiable information disclosure under CCPA, or Article 30 inventory assessment under GDPR) is a clear representation of a company’s data infrastructure. It provides a record of all of the personally identifiable data points that your company processes and contains information on that data, such as what type of data it is, why it is collected, and who has access to it.
Data mapping is the process of creating a data map. It involves identification of the elements of a business’ data model that align with a domain-specific abstract data model. Data maps are created for the purpose of applying domain specific solutions to a business’ data. In short, it’s about figuring out what types of personal data your organization is processing for the purposes of more efficient and effective data management.
As a term closely related to data mapping, database mapping is the process of inventorying a database. Database mapping aims to document the types of personal information stored in a database and the purposes of data collection. A single business can have a variety of databases—some SQL and some NoSQL, for instance. Thus, successful database mapping requires a scalable process for inventorying a variety of database structures. Hereafter, we use the term “data mapping,” as it encompasses “database mapping.”
When visualized, a data map most often contains nodes and links to show how different systems that contain any personally identifiable information link together. An effective data map helps you stay compliant with data privacy law and allows for efficient data management across your organization. You can find out more about mapping the state and flow of personal data in our guide to building a company data map.
There are a number of tasks involved in the mapping of a company’s data infrastructure for data privacy law compliance. These include:
Each of the above tasks can be performed using the following methods:
Once you have decided on the appropriate method for implementing your data map and have compiled your first iteration, you will then need to confirm the decisions that you made while completing the mapping were indeed correct.
Depending on the information used to create your company’s data map, the analysis process can be divided into two different stages:
This step involves using a data attribute’s name to decide which data dictionary element the attribute identifies. For example, an attribute named as ‘email_id’ can be identified as an ‘person.contact.email-address’.
When the attribute’s name alone is not enough, the name of the entity to which the attribute belongs is included in the analysis. For example, the attribute named as ‘name’ in an entity named ‘customer’ can be identified as a name of a person, whereas a similarly named attribute in an entity named as ‘city_master’ can be identified as a name of a city.
Sometimes a group of attributes occurring together can be identified easily. For example, attributes with names ‘lat’ and ‘lng’ can be identified as latitude and longitude of a location. Similarly, attributes with names ‘f_name’ and ‘l_name’ can be identified as parts of a person’s name.
#Example taxonomy - privacy data (yaml file) person #person’s information - name - first_name - last_name - contact #person’s contact information - mailing-address #person’s mailing address details - street - city - state - zip - phone #person’s contact phone numbers - mobile - home - work - email - identification #person’s identification details - drivers-license - number - state - passport - number - issuing-country
#Example entity and attributes (yaml file) entities: - name: users attributes - name: id datatype: number - name: f_name datatype: text - name: l_name datatype: text - name: email datatype: text - name: street datatype: text - name: city datatype: text - name: phone datatype: text
#Example data mapping (yaml file) entities: - name: users attributes - name: id datatype: number - name: f_name datatype: text dataclass: person.name.first_name - name: l_name datatype: text dataclass: person.name.last_name - name: email datatype: text dataclass: person.contact.email - name: street datatype: text dataclass: person.contact.address.street - name: city datatype: text dataclass: person.contact.address.city - name: phone datatype: text dataclass: person.contact.phone.home
The second stage involved in the confirmation of your data map involves using the actual data stored in your database to confirm the accuracy of the data map created during the label analysis stage.
Presence of certain values or patterns in the actual data can confirm certain mappings. For example, if the data for an attribute follows ‘email@example.com’ pattern, it can be confirmed that this attribute captures an email address.
Range values captured in attributes of different tables can be checked to see if those attributes are referring to the same piece of information, such as a user_id, that can be used for linking the tables with specific relationships. Similarly, absence of values in an attribute can confirm the opposite.
As soon as your data map has been compiled and validated, you should assign an individual or team who will be responsible for its upkeep as it is a constantly evolving and changing document by nature. The challenge now lies in the effective maintenance of your data map, ensuring that it stays up to date, and in compliance with any data protection or privacy regulations that may be applicable to your organization.
If you’d like to make short work of creating your organization’s data map, you can take a look at how Ethyca’s automated data mapping software does so, seamlessly.
At Ethyca, we believe that software engineers are becoming major privacy stakeholders, but do they feel the same way? To answer this question, we went out and asked 337 software engineers what they think about the state of contemporary privacy… and how they would improve it.
The UK’s new Data Reform Bill is set to ease data privacy compliance burdens on businesses to enable convenience and spark innovation in the country. We explain why convenience should not be the end result of a country’s privacy legislation.
Our team at Ethyca attended the PEPR 2022 Conference in Santa Monica live and virtually between June 23rd and 24th. We compiled three main takeaways after listening to so many great presentations about the current state of privacy engineering, and how the field will change in the future.
For privacy engineers to build privacy directly into the codebase, they need agreed-upon definitions for translating policy into code. Ethyca CEO Cillian unveils an open source system to standardize definitions for personal data living in the tech stack.
Masking data is an essential part of modern privacy engineering. We highlight a handful of masking strategies made possible with the Fides open-source platform, and we explain the difference between key terms: pseudonymization and anonymization.
The American Data Privacy and Protection Act is gaining attention as one of the most promising federal privacy bills in recent history. We highlight some of the key provisions with an emphasis on their relationship to privacy engineering.
Our team of data privacy devotees would love to show you how Ethyca helps engineers deploy CCPA, GDPR, and LGPD privacy compliance deep into business systems. Let’s chat!Book a Demo