Data Catalog
The Data Catalog allows you to explore and manage the structure of your connected databases. With it, you can view schemas, tables, columns, relationships, and enrich metadata to improve understanding of your data.
Featuresβ
π Structure Exploration
Navigate through schemas, tables, and columns
π Metadata
Database and manual descriptions
π Relationships
FKs, inferred, and ERD diagram
π Visibility
Column-level access control
Structure Explorationβ
The Data Catalog displays the complete structure of your database:
Schemas and Tablesβ
- Lists all schemas available in the connection
- For each schema, shows contained tables
- Displays column count per table
- Indicates tables with primary keys and indexes
Columnsβ
For each table, you can see:
| Information | Description |
|---|---|
| Name | Column name |
| Type | Data type (VARCHAR, INTEGER, etc.) |
| Nullable | Whether it accepts null values |
| PK | Whether it's part of the primary key |
| FK | Whether it's a foreign key |
| Default | Default value, if any |
Metadataβ
The Data Catalog supports two types of descriptions:
Database Descriptionsβ
Comments defined directly in the database via COMMENT ON:
COMMENT ON TABLE customers IS 'Active customer registry';
COMMENT ON COLUMN customers.email IS 'Primary contact email';
These descriptions are automatically imported during synchronization.
Manual Descriptionsβ
Descriptions added by your team through the Console:
- Complement or override database descriptions
- Linked to the connection in Console
- Don't modify the original database
- Can be edited at any time
AI Enrichmentβ
SoluΓ§Γ£o42 can automatically suggest descriptions based on:
- Column and table names
- Data type
- Common industry patterns
- Context from other columns
Always review AI-suggested descriptions before applying them. They are based on patterns and may not reflect specific usage in your organization.
Relationshipsβ
Foreign Keysβ
The Data Catalog automatically imports FKs defined in the database:
- Shows source table and column
- Shows target table and column
- Indicates cardinality (1:N, N:M)
Inferred Relationshipsβ
For databases without explicit FKs, the system can infer relationships by convention:
- Columns named
*_idare mapped to corresponding tables - Example:
customer_idβcustomerstable - Inferred relationships are marked as "suggested"
ERD Diagramβ
Visualize relationships graphically:
- On the connection page, click ERD
- The diagram shows all tables and their relationships
- Use zoom and pan to navigate
- Click on a table to highlight its relationships
- Filter by schema to focus on specific areas
Data Samplesβ
The Data Catalog can display data samples to facilitate understanding:
- Limit: Up to 10 rows per table
- Visibility: Respects visibility settings
- Updates: Data is fetched on demand, not stored
Columns configured as restricted or pseudonymized appear masked in samples, even for administrators.
Data Visibilityβ
Control which data in your organization can be viewed in queries, visualizations, and dashboards. Visibility settings are automatically applied to all queries, ensuring sensitive data is never accidentally exposed.
Why Use It?β
- PII Protection: Hide personal data like emails, SSN, and phone numbers
- Compliance: Meet LGPD, GDPR, and HIPAA requirements
- Security: Prevent accidental exposure of sensitive data
- Safe Analytics: Enable analysis without exposing raw data
Visibility Levelsβ
Table Visibilityβ
| Level | Description |
|---|---|
| Public | Table can be queried. Individual column visibility is respected. |
| Restricted | All table columns are hidden, regardless of individual settings. |
Column Visibilityβ
| Level | What appears in query |
|---|---|
| Public | Original data value |
| Restricted | [RESTRICTED] |
| Pseudonymized | SHA-256 hash of value (allows anonymous JOINs) |
How Data Appearsβ
Public Column:
β email β
βββββββββββββββββββββββββββββββ€
β [email protected] β
β [email protected] β
Restricted Column:
β email β
βββββββββββββββββββββββββββββββ€
β [RESTRICTED] β
β [RESTRICTED] β
Pseudonymized Column:
β email β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef12345 β
β b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456 β
The hash is deterministic: the same value always generates the same hash. This allows JOINs between tables using pseudonymized columns without revealing the original data.
Configuring Visibilityβ
By Columnβ
- Access the connection's Data Catalog
- Navigate to the desired table
- Click on the column you want to configure
- In Visibility, select the desired level
- Click Save
By Tableβ
- Access the connection's Data Catalog
- Click on the desired table
- In the details panel, locate Table Visibility
- Select Public or Restricted
- Click Save
Table visibility takes precedence over column visibility. If a table is restricted, all its columns will also be restricted.
Visibility Validationβ
Columns can have two validation states:
| State | Icon | Description |
|---|---|---|
| Not validated | Gray | Default or AI-suggested configuration |
| Validated | Green | Configuration reviewed and confirmed by a user |
Review and validate visibility for all sensitive columns after connecting a new database.
AI Suggestions for Visibilityβ
SoluΓ§Γ£o42 can automatically suggest appropriate visibility based on:
- Column name:
email,cpf,password,ssn, etc. - Data type: Long text fields may contain PII
- Industry patterns: Common conventions for sensitive data
To apply suggestions:
- In the Data Catalog, look for columns with suggestion icon (lightbulb)
- Click on the column to see the suggestion
- Review the recommendation
- Click Apply Suggestion or adjust manually
Visibility Use Casesβ
Personal Data (PII)β
| Column | Recommendation | Justification |
|---|---|---|
| Pseudonymized | Allows cohort analysis without exposing identity | |
| SSN/Tax ID | Restricted | Unique identifier, should not be exposed |
| Phone | Restricted | Sensitive personal data |
| Full name | Restricted or Pseudonymized | Depends on analysis needs |
Financial Dataβ
| Column | Recommendation | Justification |
|---|---|---|
| Card number | Restricted | Should never be exposed |
| CVV | Restricted | Should never be stored visibly |
| Balance | Restricted | Sensitive financial data |
Health Data (HIPAA)β
| Column | Recommendation | Justification |
|---|---|---|
| Patient ID | Pseudonymized | Allows analysis without identification |
| Diagnosis | Restricted | Protected medical information |
| Medications | Restricted | Protected medical information |
Automatic Enforcementβ
Visibility is automatically enforced in:
- SQL Queries: Results respect configured visibility
- Visualizations and Dashboards: Charts and filters don't expose restricted values
- AI Analytics: The AI assistant doesn't access restricted values
- Exports: All exports apply the same rules
Visibility Auditingβ
All visibility changes are recorded:
- Who changed
- When changed
- Previous value
- New value
To generate compliance reports, go to Data Catalog β Export Report β Visibility Report.
How to Useβ
Accessing the Data Catalogβ
- In the sidebar menu, click Connections
- Select the desired connection
- Click Data Catalog
Navigating the Structureβ
- Use the sidebar tree to navigate through schemas
- Expand a schema to see its tables
- Click on a table to see its columns
- Use search to find specific tables or columns
Adding Descriptionsβ
- Navigate to the desired table or column
- In the details panel, click Edit description
- Enter the description
- Click Save
Syncing Metadataβ
Metadata sync is automatic when configuring a connection. To update manually:
- Access the connection page
- Click Settings
- Click Sync Metadata
Synchronization detects only changes since the last run, making the process fast even for large databases.
Best Practicesβ
Documentationβ
- Add descriptions for all main tables
- Document columns with technical or abbreviated names
- Use AI as a starting point, then refine manually
Visibilityβ
- β Configure visibility before granting data access
- β Use pseudonymization for columns used in JOINs
- β Review visibility after each sync
- β Validate all sensitive columns before granting access
- β Don't leave sensitive columns as public
- β Don't ignore columns in staging/temp tables
- β Don't apply AI suggestions without review
Maintenanceβ
- Sync metadata after schema changes
- Review inferred relationships periodically
- Keep descriptions updated with business changes
Additional Resourcesβ
- Security - Security and compliance practices