Architecture Overview
Datalinx AI is a data integration platform designed with security, multi-tenancy, and scalability as core principles. This document provides a high-level overview of the system architecture.
Design Principles
- Security First - Multi-tenancy and isolation built into every layer
- Separation of Concerns - Clear boundaries between control and data planes
- Scalability - Horizontal scaling of data processing workers
- Maintainability - Three-layer architecture for clear code organization
- Testability - Comprehensive test framework with isolation
- Observability - Detailed logging and monitoring throughout
System Architecture
Core Components
Control Plane
The Control Plane handles all interactive operations:
| Component | Responsibility |
|---|---|
| Web Interface | User interactions, visual mapping, monitoring |
| REST API | Programmatic access, integrations |
| Authentication | User identity, session management, RBAC |
| Configuration | Workspace, pipeline, and connection settings |
| Command Queue | Queuing and dispatching work to Data Plane |
The Control Plane runs as a FastAPI server serving both the API and React frontend.
Data Plane
The Data Plane executes actual data operations in isolation:
| Component | Responsibility |
|---|---|
| Data Workers | Execute transformation and movement jobs |
| dbt Processing | Run dbt models against configured databases |
| Dagster Pipelines | Orchestrate complex data workflows |
| Resource Isolation | Each task runs with minimal privileges |
Data Plane workers poll the Control Plane for tasks, execute them, and report results back.
Storage Layer
| Store | Purpose |
|---|---|
| System Database | Users, organizations, configurations, audit logs |
| Customer Data | Source and transformed data in customer warehouses |
| File Storage | Schema definitions, mapping configurations, logs |
Communication Flow
Application Layers
Datalinx AI follows a strict three-layer architecture:
Layer 1: Interface Layer (Thin)
External interfaces to the system:
src/datalinx_demo/api/routes/ # FastAPI endpoints
src/datalinx_demo/api/tools/ # MCP tool interfaces
src/datalinx_demo/agents/ # AI agent implementations
Interface layer responsibilities:
- Input validation using Pydantic models
- Permission checking via decorators
- Delegation to business logic layer
- Response formatting
Interface layer should never contain business logic - only validation and delegation.
Layer 2: Business Logic Layer (Core)
All business logic and orchestration:
src/datalinx_demo/core/
├── mapping/ # Mapping manager
├── workspace/ # Workspace manager
├── schema/ # Schema manager
├── auth/ # Authentication manager
├── monitoring/ # Monitoring manager
└── ...
Managers coordinate between DAOs, implement workflows, and handle state management.
Layer 3: Data Access Layer (DAO)
Direct data operations:
src/datalinx_demo/core/dao/
├── system/ # System database DAOs
├── customer/ # Customer database DAOs
└── file_dao.py # File system operations
DAOs provide simple CRUD operations and return raw data.
Security Model
Authentication
- Users: Email/password with JWT tokens
- Service Accounts: API key authentication
- Sessions: Secure cookie-based sessions
Authorization
Data Protection
- At Rest: Database encryption, encrypted credentials
- In Transit: TLS 1.2+ for all connections
- Secrets: Encrypted using master key, no plaintext storage
Deployment Architecture
Scalability
Horizontal Scaling
- Control Plane: Add more server instances behind load balancer
- Data Plane: Add more workers to handle increased load
- Database: Use read replicas for read-heavy workloads
Resource Isolation
Each workspace can have isolated:
- Compute resources (warehouses)
- Storage quotas
- Rate limits
Next Steps
- Control vs Data Plane - Deep dive into the split architecture
- Multi-Tenancy - Understand tenant isolation
- Security - Security model details