Skip to main content

Control Plane & Data Plane Architecture

Datalinx AI uses a split architecture that separates the interactive user-facing components (Control Plane) from the data processing components (Data Plane). This separation provides enhanced security, scalability, and operational isolation.

Overview

Control Plane

The Control Plane handles all interactive user operations and system configuration.

Responsibilities

ComponentFunction
Web UIReact-based interface for user interaction
FastAPI ServerRESTful API handling requests and responses
AuthenticationUser sessions, SSO, API key management
Configuration ManagerWorkspace settings, source credentials, mappings

Key Characteristics

  • Stateful sessions: Maintains user context and workspace selection
  • Low latency: Optimized for interactive operations
  • Request-scoped: Each API call is handled independently
  • Access control: Enforces permissions at every endpoint

How It Works

  1. User authenticates through the Web UI or API
  2. All requests include workspace context
  3. Control Plane validates permissions
  4. Configuration is loaded from PostgreSQL
  5. For data operations, jobs are submitted to Data Plane
# Example: Control Plane loads workspace config
workspace_config = workspace_manager.get_workspace_config(workspace_name)

# Submit job to Data Plane with full configuration
dagster_client.submit_job(
job_name="run_pipeline",
workspace_config=workspace_config, # Passed, not fetched
run_config=pipeline_config
)

Data Plane

The Data Plane handles all data processing, transformation, and movement operations.

Responsibilities

ComponentFunction
Dagster OrchestratorPipeline scheduling and execution
dbt RunnerSQL transformation execution
Background WorkersAsync tasks (canonicalization, exports)
Data ConnectorsSource/destination integrations

Key Characteristics

  • Stateless execution: No dependency on user sessions
  • Self-contained: Receives all configuration needed at job start
  • Horizontally scalable: Can run multiple workers in parallel
  • Isolated: Cannot access Control Plane APIs

Configuration Passing

The Data Plane cannot access user sessions or make API calls back to the Control Plane. All configuration must be passed at job submission:

# Control Plane passes complete configuration
@asset
def transform_data(context, workspace_config: dict):
# workspace_config contains:
# - Source credentials
# - Target connection details
# - Mapping definitions
# - Transformation rules

source_conn = create_connection(workspace_config["source"])
target_conn = create_connection(workspace_config["target"])

# Process data without calling back to Control Plane
data = source_conn.fetch()
transformed = apply_mappings(data, workspace_config["mappings"])
target_conn.write(transformed)

Communication Patterns

Job Submission (Control → Data)

Status Updates (Data → Control)

The Data Plane reports status back through:

  1. Database updates: Job status written to PostgreSQL
  2. Event streaming: Real-time updates via WebSocket
  3. Metrics emission: Observability data to monitoring systems

Security Benefits

AspectBenefit
Credential isolationData plane never stores credentials - receives them per-job
Network segmentationData plane can be in isolated network
Blast radiusCompromise of data plane doesn't expose user sessions
Audit trailAll job submissions logged with full context

Deployment Models

Single-Node Development

Both planes run in a single process for local development:

# Starts both Control and Data plane
./run_dev_server.sh

Production Split

In production, planes are typically separated:

# Control Plane
control-plane:
replicas: 2
resources:
memory: 2Gi
cpu: 1

# Data Plane
data-plane:
replicas: 4
resources:
memory: 8Gi
cpu: 4

Best Practices

For Control Plane Development

  1. Never pass user context to Data Plane - Resolve all configuration before submission
  2. Validate early - Check permissions and validate config before job submission
  3. Handle async - Return job ID immediately, let client poll for status

For Data Plane Development

  1. Be self-contained - Don't assume access to any external services
  2. Log extensively - Since interactive debugging isn't possible
  3. Fail fast - If configuration is missing, fail immediately with clear error
  4. Use resources - Pass configuration through Dagster resources, not global state