Core Concepts

This page explains the fundamental concepts and terminology used throughout Datalinx AI.

Organization

An Organization is the top-level container in Datalinx AI, representing your company or team. Each organization has:

Isolated data and configurations
Its own users and roles
Separate billing and usage tracking

Workspace

A Workspace is an isolated environment within an organization for a specific project or use case. Workspaces contain:

Data source configurations
Schema definitions and mappings
Pipeline definitions
Monitoring rules

Best Practice

Use separate workspaces for development, staging, and production to prevent accidental changes to live data.

Data Sources

Data Sources are the origins of your raw data. Datalinx AI supports three types:

Database Sources

Direct connections to databases:

PostgreSQL
Snowflake
Databricks
BigQuery

API Sources

REST API endpoints with schema definitions:

OpenAPI/Swagger specifications
Custom REST endpoints
GraphQL (coming soon)

File Sources

Files stored in cloud storage:

CSV, JSON, Parquet formats
AWS S3, Azure Blob, GCS
SFTP servers

Schemas

Source Schema

The Source Schema describes the structure of your raw data:

Tables and columns
Data types
Relationships (if detected)

Source schemas are automatically discovered when you connect a data source.

Target Schema

The Target Schema defines the desired output structure:

Standardized table definitions
Required and optional fields
Data type specifications
Canonical field types (email, phone, currency, etc.)

Datalinx AI provides pre-built target schema templates for common use cases.

Mappings

Mappings define how source data transforms into target data. Each mapping specifies:

Source field(s)
Target field
Transformation logic (if any)

Mapping Types

Type	Description	Example
Direct	1:1 field copy	`source.name → target.name`
Expression	SQL transformation	`CONCAT(first_name, ' ', last_name)`
Decorated	Apply standardization	`LOWER(email)`
Computed	Derived value	`SUM(line_items.amount)`

CTEs (Common Table Expressions)

CTEs are intermediate transformations that create derived datasets. Use CTEs when you need to:

Join multiple source tables
Aggregate data before mapping
Apply complex business logic
Create reusable sub-queries

-- CTE Example: Calculate customer lifetime value
WITH customer_orders AS (
  SELECT
    customer_id,
    COUNT(*) as order_count,
    SUM(total) as lifetime_value
  FROM orders
  GROUP BY customer_id
)
SELECT * FROM customer_orders

Decorators

Decorators are pre-built transformation functions that standardize data:

Decorator	Purpose	Example
`lowercase`	Normalize case	`JOHN@EMAIL.COM → john@email.com`
`phone_format`	Standardize phone numbers	`(555) 123-4567 → +15551234567`
`date_parse`	Parse date strings	`01/15/2024 → 2024-01-15`
`currency_cents`	Convert to cents	`$19.99 → 1999`
`null_if_empty`	Convert empty to NULL	`"" → NULL`

Pipelines

A Pipeline is the execution of mappings to transform source data into target data. Pipelines can be:

Ad-hoc: Manually triggered
Scheduled: Run on a cron schedule
Triggered: Activated by events

Pipeline Modes

Mode	Description	Use Case
Full Refresh	Process all data	Initial load, schema changes
Incremental	Process only new/changed	Regular updates
Backfill	Process historical range	Gap filling

Control Plane vs Data Plane

Datalinx AI uses a split architecture for security and scalability:

Control Plane

The Control Plane handles:

User interface and API
Configuration management
Authentication and authorization
Command orchestration

Data Plane

The Data Plane executes:

Data transformations (dbt)
Pipeline runs
Query execution
Monitoring checks

Identity Resolution

Identity Resolution links records from different sources to the same entity:

Match on email, phone, or custom identifiers
Build an identity graph
Create canonical customer records

Audiences

Audiences are segments of entities based on criteria:

-- Example: High-value customers
SELECT customer_id
FROM customers
WHERE lifetime_value > 1000
  AND last_order_date > CURRENT_DATE - INTERVAL '90 days'

Audiences can be:

Exported to marketing platforms
Used for analysis
Monitored for changes

Reverse ETL

Reverse ETL pushes transformed data back to operational systems:

CRM updates (Salesforce, HubSpot)
Marketing platform syncs (Meta, Google Ads)
Custom webhook destinations

Monitoring

Monitoring continuously checks data quality:

Rule Types

Volume: Row counts within expected range
Freshness: Data updated within time window
Quality: Values match expected patterns
Schema: Structure hasn't changed unexpectedly

Alerts

When rules fail, alerts notify you via:

Email
Slack
Webhooks

Service Accounts

Service Accounts provide programmatic access:

API key authentication
Limited permissions
Used by Data Plane workers
Audit logged

Roles and Permissions

Datalinx AI uses role-based access control (RBAC):

Role	Capabilities
Admin	Full access to organization
Developer	Create/edit workspaces and mappings
Operator	Run pipelines, view monitoring
Viewer	Read-only access

Next Steps

Quick Start - Apply these concepts hands-on
System Requirements - Prepare your environment
Architecture Overview - Deep dive into system design

Organization​

Workspace​

Data Sources​

Database Sources​

API Sources​

File Sources​

Schemas​

Source Schema​

Target Schema​

Mappings​

Mapping Types​

CTEs (Common Table Expressions)​

Decorators​

Pipelines​

Pipeline Modes​

Control Plane vs Data Plane​

Control Plane​

Data Plane​

Identity Resolution​

Audiences​

Reverse ETL​

Monitoring​

Rule Types​

Alerts​

Service Accounts​

Roles and Permissions​

Next Steps​