Skip to main content

Architecture Overview

Datalinx AI is a data integration platform designed with security, multi-tenancy, and scalability as core principles. This document provides a high-level overview of the system architecture.

Design Principles

  1. Security First - Multi-tenancy and isolation built into every layer
  2. Separation of Concerns - Clear boundaries between control and data planes
  3. Scalability - Horizontal scaling of data processing workers
  4. Maintainability - Three-layer architecture for clear code organization
  5. Testability - Comprehensive test framework with isolation
  6. Observability - Detailed logging and monitoring throughout

System Architecture

Core Components

Control Plane

The Control Plane handles all interactive operations:

ComponentResponsibility
Web InterfaceUser interactions, visual mapping, monitoring
REST APIProgrammatic access, integrations
AuthenticationUser identity, session management, RBAC
ConfigurationWorkspace, pipeline, and connection settings
Command QueueQueuing and dispatching work to Data Plane

The Control Plane runs as a FastAPI server serving both the API and React frontend.

Data Plane

The Data Plane executes actual data operations in isolation:

ComponentResponsibility
Data WorkersExecute transformation and movement jobs
dbt ProcessingRun dbt models against configured databases
Dagster PipelinesOrchestrate complex data workflows
Resource IsolationEach task runs with minimal privileges

Data Plane workers poll the Control Plane for tasks, execute them, and report results back.

Storage Layer

StorePurpose
System DatabaseUsers, organizations, configurations, audit logs
Customer DataSource and transformed data in customer warehouses
File StorageSchema definitions, mapping configurations, logs

Communication Flow

Application Layers

Datalinx AI follows a strict three-layer architecture:

Layer 1: Interface Layer (Thin)

External interfaces to the system:

src/datalinx_demo/api/routes/    # FastAPI endpoints
src/datalinx_demo/api/tools/ # MCP tool interfaces
src/datalinx_demo/agents/ # AI agent implementations

Interface layer responsibilities:

  • Input validation using Pydantic models
  • Permission checking via decorators
  • Delegation to business logic layer
  • Response formatting
warning

Interface layer should never contain business logic - only validation and delegation.

Layer 2: Business Logic Layer (Core)

All business logic and orchestration:

src/datalinx_demo/core/
├── mapping/ # Mapping manager
├── workspace/ # Workspace manager
├── schema/ # Schema manager
├── auth/ # Authentication manager
├── monitoring/ # Monitoring manager
└── ...

Managers coordinate between DAOs, implement workflows, and handle state management.

Layer 3: Data Access Layer (DAO)

Direct data operations:

src/datalinx_demo/core/dao/
├── system/ # System database DAOs
├── customer/ # Customer database DAOs
└── file_dao.py # File system operations

DAOs provide simple CRUD operations and return raw data.

Security Model

Authentication

  • Users: Email/password with JWT tokens
  • Service Accounts: API key authentication
  • Sessions: Secure cookie-based sessions

Authorization

Data Protection

  • At Rest: Database encryption, encrypted credentials
  • In Transit: TLS 1.2+ for all connections
  • Secrets: Encrypted using master key, no plaintext storage

Deployment Architecture

Scalability

Horizontal Scaling

  • Control Plane: Add more server instances behind load balancer
  • Data Plane: Add more workers to handle increased load
  • Database: Use read replicas for read-heavy workloads

Resource Isolation

Each workspace can have isolated:

  • Compute resources (warehouses)
  • Storage quotas
  • Rate limits

Next Steps