Database Sources

This guide covers how to connect relational databases as data sources in Datalinx AI. Database connections enable you to ingest data from your operational systems for transformation and analysis.

Supported Databases

Datalinx AI supports the following database systems:

Database	Versions	Connection Method
PostgreSQL	11+	Direct, SSH tunnel
MySQL	5.7+, 8.x	Direct, SSH tunnel
Snowflake	Current	Native connector
Databricks	Unity Catalog	Native connector
SQL Server	2016+	Direct
Oracle	12c+	Direct
BigQuery	Current	Service account

Adding a Database Source

Step 1: Navigate to Sources

Open your workspace
Click Sources in the sidebar
Click Add Source
Select Database as the source type

Step 2: Choose Database Type

Select your database type from the list. Each type has specific connection requirements.

Step 3: Configure Connection

PostgreSQL / MySQL Example

Connection Details:
  Host: db.example.com
  Port: 5432
  Database: production
  Username: datalinx_reader
  Password: ********

Advanced Options:
  SSL Mode: require
  Connection Timeout: 30s
  Read Timeout: 300s

Snowflake Example

Connection Details:
  Account: acme.us-east-1
  Warehouse: COMPUTE_WH
  Database: ANALYTICS
  Schema: RAW_DATA
  Role: DATA_READER

Authentication:
  Method: Key Pair
  Private Key: [uploaded]
  Passphrase: ********

Databricks Example

Connection Details:
  Host: acme.cloud.databricks.com
  HTTP Path: /sql/1.0/warehouses/abc123
  Catalog: main
  Schema: raw

Authentication:
  Method: Service Principal
  Client ID: ********
  Client Secret: ********

Step 4: Test Connection

Click Test Connection to verify:

Testing connection to PostgreSQL...
✓ Network connectivity
✓ Authentication successful
✓ Schema access verified
✓ Found 24 tables

Connection successful!

Step 5: Configure Schema Discovery

Choose which tables to include:

All Tables: Include all accessible tables
Select Tables: Choose specific tables
Pattern Match: Use wildcards (e.g., orders_*, customer_*)

Connection Security

Network Security

Direct Connection

For databases accessible from the internet:

Whitelist Datalinx AI IP addresses
Enable SSL/TLS encryption
Use strong credentials

# Required IP whitelist
Datalinx AI IPs:
  - 52.1.2.3/32
  - 52.1.2.4/32
  - 52.1.2.5/32

SSH Tunnel

For databases in private networks:

SSH Tunnel:
  Enabled: true
  SSH Host: bastion.example.com
  SSH Port: 22
  SSH User: datalinx
  SSH Key: [uploaded private key]

  Database Host: internal-db.private
  Database Port: 5432

Credential Storage

All credentials are encrypted:

At Rest: AES-256 encryption
In Transit: TLS 1.3
Access: Only decrypted during pipeline execution

tip

Use read-only database users for source connections. Datalinx AI never needs write access to your source databases.

Schema Discovery

Automatic Discovery

When you add a database source, Datalinx AI automatically:

Scans available schemas
Lists all tables
Discovers column names and types
Identifies primary keys and relationships
Samples data for profiling

Schema Refresh

Schemas can change. Configure automatic refresh:

Option	Description
On Connect	Refresh each time pipeline runs
Scheduled	Daily/weekly refresh
Manual	Only when you click refresh

Handling Schema Changes

When source schema changes are detected:

Schema Change Detected:
┌─────────────────────────────────────────────┐
│ Table: customers                            │
│                                             │
│ + Added columns:                            │
│   - loyalty_tier (VARCHAR)                  │
│   - signup_source (VARCHAR)                 │
│                                             │
│ ~ Modified columns:                         │
│   - email: VARCHAR(100) → VARCHAR(255)      │
│                                             │
│ - Removed columns:                          │
│   - legacy_id                               │
└─────────────────────────────────────────────┘

[Update Mappings] [Ignore] [View Details]

Performance Optimization

Query Pushdown

Datalinx AI pushes filters to the source database when possible:

-- Instead of fetching all data:
SELECT * FROM orders;

-- Pushes filter to source:
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND status = 'completed';

Incremental Loading

Configure incremental loading to avoid full table scans:

Incremental Config:
  Cursor Column: updated_at
  Cursor Type: TIMESTAMP
  Initial Value: 2024-01-01T00:00:00Z
  Batch Size: 10000

Connection Pooling

For high-frequency pipelines:

Pool Settings:
  Min Connections: 2
  Max Connections: 10
  Idle Timeout: 300s

Troubleshooting

Connection Failures

Error	Cause	Solution
`Connection refused`	Network/firewall	Check IP whitelist
`Authentication failed`	Wrong credentials	Verify username/password
`SSL handshake failed`	Certificate issue	Check SSL configuration
`Timeout`	Network latency	Increase timeout values

Permission Errors

-- Required permissions for source user
GRANT CONNECT ON DATABASE mydb TO datalinx_reader;
GRANT USAGE ON SCHEMA public TO datalinx_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO datalinx_reader;

Slow Discovery

If schema discovery is slow:

Limit tables with pattern matching
Exclude large tables from initial discovery
Schedule discovery during off-peak hours

Best Practices

Security

Create dedicated users: Don't use admin accounts
Grant minimal permissions: Read-only access only
Use SSL/TLS: Always encrypt connections
Rotate credentials: Update passwords regularly

Performance

Use incremental loading: Avoid full table scans
Index cursor columns: Speed up incremental queries
Limit concurrent connections: Respect database limits
Schedule off-peak: Run large syncs during low-traffic periods

Maintenance

Monitor schema changes: Enable change detection
Review connection logs: Check for errors
Test connections regularly: Catch issues early
Document sources: Maintain connection documentation

Data Sources Overview - Introduction to data sources
File Sources - Connecting file-based data
API Sources - Connecting REST/GraphQL APIs
Schema Configuration - Working with schemas

Supported Databases​

Adding a Database Source​

Step 1: Navigate to Sources​

Step 2: Choose Database Type​

Step 3: Configure Connection​

PostgreSQL / MySQL Example​

Snowflake Example​

Databricks Example​

Step 4: Test Connection​

Step 5: Configure Schema Discovery​

Connection Security​

Network Security​

Direct Connection​

SSH Tunnel​

Credential Storage​

Schema Discovery​

Automatic Discovery​

Schema Refresh​

Handling Schema Changes​

Performance Optimization​

Query Pushdown​

Incremental Loading​

Connection Pooling​

Troubleshooting​

Connection Failures​

Permission Errors​

Slow Discovery​

Best Practices​

Security​

Performance​

Maintenance​

Related Documentation​