Data Catalog Overview

Background

TerraTrue’s Data Catalog allows you to connect directly with your data sources, scan your data tables, and populate a list of data types in use. We’ll then classify those data types and match them to your TerraTrue taxonomy, giving you a clear view into what data your organization is currently using. 

Today, we offer the ability to connect with the following ingestion sources:

  • Amplitude
  • Athena
  • BigQuery
  • Databricks Unity Catalog
  • DynamoDB
  • Elasticsearch
  • Glue
  • Hive (Azure HDInsight)
  • Kafka (Confluent Cloud)
  • MongoDB (ATLAS)
  • MySQL
  • Oracle
  • Postgres
  • Redshift
  • Snowflake
  • SQL Server

Please reach out to your Customer Success Manager if you’d like to request any additional sources. 

Data Catalog v1 is available to all customers participating in our beta trial. If you would like to join, please notify your Customer Success Manager. We will be releasing a v2 later this year that combines the results of your scans with the rest of Privacy Central, and bridges the gap between your proactive reviews and your actual data map.

To visit Data Catalog, click the data warehouse icon on the left-hand side of TerraTrue. You’ll then see up to four options, depending on your permissions*

  • Explore
  • Datasets
  • Ingestion
  • Settings

If you’re just getting started with Data Catalog and haven’t scanned anything yet, you can learn how to ingest data by reading our Ingestion Instructions

Screenshot_2023-06-05_at_3.27.44_PM.png

 

*Note - the Data Catalog supports 3 specific user roles: Data Catalog Admin, Data Catalog Editor and Data Catalog Viewer. Each of these roles supports different levels of interaction with the Data Catalog and this is summarized in the table below:

User Role Catalog visible in main nav Explore Search Datasets Dataset Schema Ingestion Settings
TerraTrue Admin
Data Catalog Admin
Data Catalog Editor
Data Catalog Viewer ✅ **
Observer

** - cannot edit descriptions or data type classifications.

Read more about user permissions here.

Explore

Explore gives you a birds eye view of the data in your catalog. From here, you can:

  • Search for a data type, data set, column, or keyword to drill in and find anything specific in your catalog
  • View your Data Sources to see where you’ve ingested from, and click any of them to view all of its datasets
  • Click ‘Data Types’, and see all the data types cataloged across your data sources. You can toggle between viewing ‘All’ to see all the data types in your existing Taxonomy, or toggling to ‘Matched’ to see just the data types detected in your datasets. The number in parentheses indicates how many datasets the type was detected in. 

Screenshot_2023-06-05_at_3.17.30_PM.png

Datasets

Here, you can explore exactly what’s been cataloged in your datasets. Use the filter to drill down by data source, or by data type, to find a dataset you’re looking for. 

You can also click into a dataset to see its schema and take a few actions from this screen:

  1. Add a description to the dataset
  2. See the exact column names and string types that have been detected
  3. See the auto-classified data types TerraTrue detected. If a data type is incorrectly classified, simply click on it and either type the name of the correct data type or scroll through the window to select the right one. 

Screenshot_2023-06-05_at_3.28.53_PM.png

Ingestion

On the ingestion page, there are three options:

  • Sources - view your active ingestion sources including the source name, ingestion name, number of times its been executed, last executed date, and it’s ingestion status. Click the three-dot menu icon on any source to edit or delete. 
  • Secrets - manage your existing secrets or click ‘Create New’ to add a new one. The use of Secrets is explained in more detail on our Ingestion Instructions article. 

And finally, the ability to connect a new source. For detailed instructions on how to ingest from a new source, please visit our Ingestion Instructions article. 

Once configured, ingestions will run daily between 9AM and 5PM EST to ensure you have the most up to date information in your catalog.

Screenshot_2023-06-05_at_3.27.23_PM.png

Settings

You can manage two settings on this page:

  1. The email for your Google Cloud Service Account, which is required to set up ingestion from BigQuery.
  2. The Inbound IP Addresses you'll use to complete ingestions. 

More information about using either of these settings can be outlined in our Ingestion Instructions article. 

 

Reporting

In Privacy Central you can display information on data types collected from launches and from data catalog side-by-side. First, navigate to privacy central and select Data Types. From there, you can view the number of data sources and instances for each data type on the right, and in the aggregate in the bottom left:

 

 

If you would like to deep dive into a specific data type by selecting from the table above, you also have the option to view in data catalog, and if that data type triggered a launch, view that launch in the launchpad:

 

 

Was this article helpful?
1 out of 3 found this helpful