Skip to main content

Data Catalog

The Data Catalog allows you to explore and manage the structure of your connected databases. With it, you can view schemas, tables, columns, relationships, and enrich metadata to improve understanding of your data.

Features​

πŸ“Š Structure Exploration

Navigate through schemas, tables, and columns

πŸ“ Metadata

Database and manual descriptions

πŸ”— Relationships

FKs, inferred, and ERD diagram

πŸ”’ Visibility

Column-level access control

Structure Exploration​

The Data Catalog displays the complete structure of your database:

Schemas and Tables​

  • Lists all schemas available in the connection
  • For each schema, shows contained tables
  • Displays column count per table
  • Indicates tables with primary keys and indexes

Columns​

For each table, you can see:

InformationDescription
NameColumn name
TypeData type (VARCHAR, INTEGER, etc.)
NullableWhether it accepts null values
PKWhether it's part of the primary key
FKWhether it's a foreign key
DefaultDefault value, if any

Metadata​

The Data Catalog supports two types of descriptions:

Database Descriptions​

Comments defined directly in the database via COMMENT ON:

COMMENT ON TABLE customers IS 'Active customer registry';
COMMENT ON COLUMN customers.email IS 'Primary contact email';

These descriptions are automatically imported during synchronization.

Manual Descriptions​

Descriptions added by your team through the Console:

  • Complement or override database descriptions
  • Linked to the connection in Console
  • Don't modify the original database
  • Can be edited at any time

AI Enrichment​

SoluΓ§Γ£o42 can automatically suggest descriptions based on:

  • Column and table names
  • Data type
  • Common industry patterns
  • Context from other columns
Review Suggestions

Always review AI-suggested descriptions before applying them. They are based on patterns and may not reflect specific usage in your organization.

Relationships​

Foreign Keys​

The Data Catalog automatically imports FKs defined in the database:

  • Shows source table and column
  • Shows target table and column
  • Indicates cardinality (1:N, N:M)

Inferred Relationships​

For databases without explicit FKs, the system can infer relationships by convention:

  • Columns named *_id are mapped to corresponding tables
  • Example: customer_id β†’ customers table
  • Inferred relationships are marked as "suggested"

ERD Diagram​

Visualize relationships graphically:

  1. On the connection page, click ERD
  2. The diagram shows all tables and their relationships
  3. Use zoom and pan to navigate
  4. Click on a table to highlight its relationships
  5. Filter by schema to focus on specific areas

Data Samples​

The Data Catalog can display data samples to facilitate understanding:

  • Limit: Up to 10 rows per table
  • Visibility: Respects visibility settings
  • Updates: Data is fetched on demand, not stored
Sensitive Data

Columns configured as restricted or pseudonymized appear masked in samples, even for administrators.

Data Visibility​

Control which data in your organization can be viewed in queries, visualizations, and dashboards. Visibility settings are automatically applied to all queries, ensuring sensitive data is never accidentally exposed.

Why Use It?​

  • PII Protection: Hide personal data like emails, SSN, and phone numbers
  • Compliance: Meet LGPD, GDPR, and HIPAA requirements
  • Security: Prevent accidental exposure of sensitive data
  • Safe Analytics: Enable analysis without exposing raw data

Visibility Levels​

Table Visibility​

LevelDescription
PublicTable can be queried. Individual column visibility is respected.
RestrictedAll table columns are hidden, regardless of individual settings.

Column Visibility​

LevelWhat appears in query
PublicOriginal data value
Restricted[RESTRICTED]
PseudonymizedSHA-256 hash of value (allows anonymous JOINs)

How Data Appears​

Public Column:

β”‚ email                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [email protected] β”‚
β”‚ [email protected] β”‚

Restricted Column:

β”‚ email                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [RESTRICTED] β”‚
β”‚ [RESTRICTED] β”‚

Pseudonymized Column:

β”‚ email                                                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef12345 β”‚
β”‚ b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456 β”‚
Pseudonymization and JOINs

The hash is deterministic: the same value always generates the same hash. This allows JOINs between tables using pseudonymized columns without revealing the original data.

Configuring Visibility​

By Column​

  1. Access the connection's Data Catalog
  2. Navigate to the desired table
  3. Click on the column you want to configure
  4. In Visibility, select the desired level
  5. Click Save

By Table​

  1. Access the connection's Data Catalog
  2. Click on the desired table
  3. In the details panel, locate Table Visibility
  4. Select Public or Restricted
  5. Click Save
Priority

Table visibility takes precedence over column visibility. If a table is restricted, all its columns will also be restricted.

Visibility Validation​

Columns can have two validation states:

StateIconDescription
Not validatedGrayDefault or AI-suggested configuration
ValidatedGreenConfiguration reviewed and confirmed by a user
Recommendation

Review and validate visibility for all sensitive columns after connecting a new database.

AI Suggestions for Visibility​

SoluΓ§Γ£o42 can automatically suggest appropriate visibility based on:

  • Column name: email, cpf, password, ssn, etc.
  • Data type: Long text fields may contain PII
  • Industry patterns: Common conventions for sensitive data

To apply suggestions:

  1. In the Data Catalog, look for columns with suggestion icon (lightbulb)
  2. Click on the column to see the suggestion
  3. Review the recommendation
  4. Click Apply Suggestion or adjust manually

Visibility Use Cases​

Personal Data (PII)​

ColumnRecommendationJustification
EmailPseudonymizedAllows cohort analysis without exposing identity
SSN/Tax IDRestrictedUnique identifier, should not be exposed
PhoneRestrictedSensitive personal data
Full nameRestricted or PseudonymizedDepends on analysis needs

Financial Data​

ColumnRecommendationJustification
Card numberRestrictedShould never be exposed
CVVRestrictedShould never be stored visibly
BalanceRestrictedSensitive financial data

Health Data (HIPAA)​

ColumnRecommendationJustification
Patient IDPseudonymizedAllows analysis without identification
DiagnosisRestrictedProtected medical information
MedicationsRestrictedProtected medical information

Automatic Enforcement​

Visibility is automatically enforced in:

  • SQL Queries: Results respect configured visibility
  • Visualizations and Dashboards: Charts and filters don't expose restricted values
  • AI Analytics: The AI assistant doesn't access restricted values
  • Exports: All exports apply the same rules

Visibility Auditing​

All visibility changes are recorded:

  • Who changed
  • When changed
  • Previous value
  • New value

To generate compliance reports, go to Data Catalog β†’ Export Report β†’ Visibility Report.

How to Use​

Accessing the Data Catalog​

  1. In the sidebar menu, click Connections
  2. Select the desired connection
  3. Click Data Catalog
  1. Use the sidebar tree to navigate through schemas
  2. Expand a schema to see its tables
  3. Click on a table to see its columns
  4. Use search to find specific tables or columns

Adding Descriptions​

  1. Navigate to the desired table or column
  2. In the details panel, click Edit description
  3. Enter the description
  4. Click Save

Syncing Metadata​

Metadata sync is automatic when configuring a connection. To update manually:

  1. Access the connection page
  2. Click Settings
  3. Click Sync Metadata
Incremental Sync

Synchronization detects only changes since the last run, making the process fast even for large databases.

Best Practices​

Documentation​

  • Add descriptions for all main tables
  • Document columns with technical or abbreviated names
  • Use AI as a starting point, then refine manually

Visibility​

  • βœ… Configure visibility before granting data access
  • βœ… Use pseudonymization for columns used in JOINs
  • βœ… Review visibility after each sync
  • βœ… Validate all sensitive columns before granting access
  • ❌ Don't leave sensitive columns as public
  • ❌ Don't ignore columns in staging/temp tables
  • ❌ Don't apply AI suggestions without review

Maintenance​

  • Sync metadata after schema changes
  • Review inferred relationships periodically
  • Keep descriptions updated with business changes

Additional Resources​

  • Security - Security and compliance practices