Skip to main content

Multi-Tenancy Model

Datalinx AI is designed as a multi-tenant platform where multiple organizations share infrastructure while maintaining complete data isolation. This document explains the tenancy model and isolation boundaries.

Tenancy Hierarchy

Organization

The top-level tenant boundary. Each organization has:

  • Dedicated database schema: All data stored in organization-specific schema
  • Isolated users: User accounts belong to a single organization
  • Separate billing: Usage tracked and billed per organization
  • Admin controls: Organization admins manage their own users and settings

Workspace

A logical grouping within an organization for project isolation:

  • Source configurations: Database connections, API credentials, file sources
  • Target schemas: Definitions of output data structures
  • Mappings: Field-level transformation rules
  • Pipelines: Scheduled data processing jobs

Data Isolation

Database Schema Isolation

Each organization's data is stored in a separate PostgreSQL schema:

-- Organization A's data
CREATE SCHEMA org_a;
CREATE TABLE org_a.workspaces (...);
CREATE TABLE org_a.sources (...);
CREATE TABLE org_a.mappings (...);

-- Organization B's data (completely separate)
CREATE SCHEMA org_b;
CREATE TABLE org_b.workspaces (...);
CREATE TABLE org_b.sources (...);
CREATE TABLE org_b.mappings (...);

Row-Level Security

Additional protection through PostgreSQL row-level security:

-- Policy ensures users only see their organization's data
CREATE POLICY org_isolation ON workspaces
FOR ALL
USING (org_id = current_setting('app.current_org_id')::uuid);

Query Context

Every database query includes organization context:

async def get_workspaces(self, org_id: str) -> List[Workspace]:
"""Always scoped to organization."""
query = """
SELECT * FROM workspaces
WHERE organization_id = $1
"""
return await self.conn.fetch(query, org_id)

Access Control Model

Role-Based Access Control (RBAC)

Permission Levels

RoleScopeCapabilities
Organization AdminOrganizationFull control over org settings, users, and all workspaces
Workspace AdminWorkspaceFull control over specific workspace
Workspace EditorWorkspaceCreate and modify mappings, run pipelines
Workspace ViewerWorkspaceRead-only access to workspace data

Permission Enforcement

Permissions are checked at every API endpoint:

@router.post("/mappings")
@require_permission("workspace:edit")
async def create_mapping(
request: Request,
mapping: MappingCreate
):
# Permission already validated by decorator
return await mapping_manager.create(mapping)

Workspace Isolation

Configuration Isolation

Each workspace has isolated configuration:

# Workspace A configuration
workspace_a:
sources:
- name: customer_db
type: postgresql
credentials: encrypted_creds_a

targets:
- name: analytics
warehouse: databricks
catalog: workspace_a_catalog

# Workspace B (completely separate)
workspace_b:
sources:
- name: customer_db # Same name, different connection
type: postgresql
credentials: encrypted_creds_b

Data Processing Isolation

Pipeline execution is workspace-scoped:

# Jobs run with workspace context
@asset
def process_data(context):
workspace = context.resources.workspace_config

# All operations scoped to this workspace
source = get_source(workspace["source_id"])
target = get_target(workspace["target_id"])

# Data never crosses workspace boundaries
data = source.fetch()
transformed = transform(data, workspace["mappings"])
target.write(transformed)

Cross-Tenant Considerations

What Is NOT Shared

ResourceIsolation Level
User accountsOrganization
Source credentialsWorkspace
Mapping definitionsWorkspace
Pipeline configurationsWorkspace
Processed dataWorkspace
Audit logsOrganization

What IS Shared (Platform Level)

ResourceWhy Shared
Target schema templatesRead-only templates for common schemas
Connector definitionsGeneric connector code, not credentials
Platform documentationPublic documentation
System health metricsAggregated, anonymized

Service Account Isolation

Service accounts enable programmatic access while maintaining isolation:

# Service account scoped to specific workspace
service_account = ServiceAccount(
name="etl_pipeline",
organization_id=org_id,
workspace_id=workspace_id, # Scoped to single workspace
permissions=["pipeline:run", "mapping:read"]
)

# API calls with service account are automatically scoped
client = DatalinxClient(api_key=service_account.api_key)
# Can only access the scoped workspace

Audit and Compliance

Audit Trail

All operations are logged with tenant context:

{
"timestamp": "2024-01-15T10:30:00Z",
"organization_id": "org_123",
"workspace_id": "ws_456",
"user_id": "user_789",
"action": "mapping.create",
"resource_id": "map_abc",
"ip_address": "192.168.1.1",
"user_agent": "Datalinx-SDK/1.0"
}

Data Residency

Organizations can specify data residency requirements:

  • Region selection: Choose deployment region for data processing
  • Data warehouse location: Control where transformed data is stored
  • Backup locations: Specify regions for backup storage

Best Practices

For Platform Administrators

  1. Never share credentials between organizations
  2. Audit cross-tenant queries - should never happen in normal operation
  3. Monitor resource usage per organization for fair usage
  4. Rotate encryption keys on a regular schedule

For Organization Administrators

  1. Use workspace separation for different projects/environments
  2. Apply least-privilege - give users minimum required permissions
  3. Review access regularly - remove unused accounts
  4. Use service accounts for automated processes, not personal accounts