Control Plane & Data Plane Architecture
Datalinx AI uses a split architecture that separates the interactive user-facing components (Control Plane) from the data processing components (Data Plane). This separation provides enhanced security, scalability, and operational isolation.
Overview
Control Plane
The Control Plane handles all interactive user operations and system configuration.
Responsibilities
| Component | Function |
|---|---|
| Web UI | React-based interface for user interaction |
| FastAPI Server | RESTful API handling requests and responses |
| Authentication | User sessions, SSO, API key management |
| Configuration Manager | Workspace settings, source credentials, mappings |
Key Characteristics
- Stateful sessions: Maintains user context and workspace selection
- Low latency: Optimized for interactive operations
- Request-scoped: Each API call is handled independently
- Access control: Enforces permissions at every endpoint
How It Works
- User authenticates through the Web UI or API
- All requests include workspace context
- Control Plane validates permissions
- Configuration is loaded from PostgreSQL
- For data operations, jobs are submitted to Data Plane
# Example: Control Plane loads workspace config
workspace_config = workspace_manager.get_workspace_config(workspace_name)
# Submit job to Data Plane with full configuration
dagster_client.submit_job(
job_name="run_pipeline",
workspace_config=workspace_config, # Passed, not fetched
run_config=pipeline_config
)
Data Plane
The Data Plane handles all data processing, transformation, and movement operations.
Responsibilities
| Component | Function |
|---|---|
| Dagster Orchestrator | Pipeline scheduling and execution |
| dbt Runner | SQL transformation execution |
| Background Workers | Async tasks (canonicalization, exports) |
| Data Connectors | Source/destination integrations |
Key Characteristics
- Stateless execution: No dependency on user sessions
- Self-contained: Receives all configuration needed at job start
- Horizontally scalable: Can run multiple workers in parallel
- Isolated: Cannot access Control Plane APIs
Configuration Passing
The Data Plane cannot access user sessions or make API calls back to the Control Plane. All configuration must be passed at job submission:
# Control Plane passes complete configuration
@asset
def transform_data(context, workspace_config: dict):
# workspace_config contains:
# - Source credentials
# - Target connection details
# - Mapping definitions
# - Transformation rules
source_conn = create_connection(workspace_config["source"])
target_conn = create_connection(workspace_config["target"])
# Process data without calling back to Control Plane
data = source_conn.fetch()
transformed = apply_mappings(data, workspace_config["mappings"])
target_conn.write(transformed)
Communication Patterns
Job Submission (Control → Data)
Status Updates (Data → Control)
The Data Plane reports status back through:
- Database updates: Job status written to PostgreSQL
- Event streaming: Real-time updates via WebSocket
- Metrics emission: Observability data to monitoring systems
Security Benefits
| Aspect | Benefit |
|---|---|
| Credential isolation | Data plane never stores credentials - receives them per-job |
| Network segmentation | Data plane can be in isolated network |
| Blast radius | Compromise of data plane doesn't expose user sessions |
| Audit trail | All job submissions logged with full context |
Deployment Models
Single-Node Development
Both planes run in a single process for local development:
# Starts both Control and Data plane
./run_dev_server.sh
Production Split
In production, planes are typically separated:
# Control Plane
control-plane:
replicas: 2
resources:
memory: 2Gi
cpu: 1
# Data Plane
data-plane:
replicas: 4
resources:
memory: 8Gi
cpu: 4
Best Practices
For Control Plane Development
- Never pass user context to Data Plane - Resolve all configuration before submission
- Validate early - Check permissions and validate config before job submission
- Handle async - Return job ID immediately, let client poll for status
For Data Plane Development
- Be self-contained - Don't assume access to any external services
- Log extensively - Since interactive debugging isn't possible
- Fail fast - If configuration is missing, fail immediately with clear error
- Use resources - Pass configuration through Dagster resources, not global state
Related Documentation
- Architecture Overview - High-level system design
- Multi-Tenancy - How tenant isolation works
- Security Architecture - Security controls and practices