Multi-Tenancy Model
Datalinx AI is designed as a multi-tenant platform where multiple organizations share infrastructure while maintaining complete data isolation. This document explains the tenancy model and isolation boundaries.
Tenancy Hierarchy
Organization
The top-level tenant boundary. Each organization has:
- Dedicated database schema: All data stored in organization-specific schema
- Isolated users: User accounts belong to a single organization
- Separate billing: Usage tracked and billed per organization
- Admin controls: Organization admins manage their own users and settings
Workspace
A logical grouping within an organization for project isolation:
- Source configurations: Database connections, API credentials, file sources
- Target schemas: Definitions of output data structures
- Mappings: Field-level transformation rules
- Pipelines: Scheduled data processing jobs
Data Isolation
Database Schema Isolation
Each organization's data is stored in a separate PostgreSQL schema:
-- Organization A's data
CREATE SCHEMA org_a;
CREATE TABLE org_a.workspaces (...);
CREATE TABLE org_a.sources (...);
CREATE TABLE org_a.mappings (...);
-- Organization B's data (completely separate)
CREATE SCHEMA org_b;
CREATE TABLE org_b.workspaces (...);
CREATE TABLE org_b.sources (...);
CREATE TABLE org_b.mappings (...);
Row-Level Security
Additional protection through PostgreSQL row-level security:
-- Policy ensures users only see their organization's data
CREATE POLICY org_isolation ON workspaces
FOR ALL
USING (org_id = current_setting('app.current_org_id')::uuid);
Query Context
Every database query includes organization context:
async def get_workspaces(self, org_id: str) -> List[Workspace]:
"""Always scoped to organization."""
query = """
SELECT * FROM workspaces
WHERE organization_id = $1
"""
return await self.conn.fetch(query, org_id)
Access Control Model
Role-Based Access Control (RBAC)
Permission Levels
| Role | Scope | Capabilities |
|---|---|---|
| Organization Admin | Organization | Full control over org settings, users, and all workspaces |
| Workspace Admin | Workspace | Full control over specific workspace |
| Workspace Editor | Workspace | Create and modify mappings, run pipelines |
| Workspace Viewer | Workspace | Read-only access to workspace data |
Permission Enforcement
Permissions are checked at every API endpoint:
@router.post("/mappings")
@require_permission("workspace:edit")
async def create_mapping(
request: Request,
mapping: MappingCreate
):
# Permission already validated by decorator
return await mapping_manager.create(mapping)
Workspace Isolation
Configuration Isolation
Each workspace has isolated configuration:
# Workspace A configuration
workspace_a:
sources:
- name: customer_db
type: postgresql
credentials: encrypted_creds_a
targets:
- name: analytics
warehouse: databricks
catalog: workspace_a_catalog
# Workspace B (completely separate)
workspace_b:
sources:
- name: customer_db # Same name, different connection
type: postgresql
credentials: encrypted_creds_b
Data Processing Isolation
Pipeline execution is workspace-scoped:
# Jobs run with workspace context
@asset
def process_data(context):
workspace = context.resources.workspace_config
# All operations scoped to this workspace
source = get_source(workspace["source_id"])
target = get_target(workspace["target_id"])
# Data never crosses workspace boundaries
data = source.fetch()
transformed = transform(data, workspace["mappings"])
target.write(transformed)
Cross-Tenant Considerations
What Is NOT Shared
| Resource | Isolation Level |
|---|---|
| User accounts | Organization |
| Source credentials | Workspace |
| Mapping definitions | Workspace |
| Pipeline configurations | Workspace |
| Processed data | Workspace |
| Audit logs | Organization |
What IS Shared (Platform Level)
| Resource | Why Shared |
|---|---|
| Target schema templates | Read-only templates for common schemas |
| Connector definitions | Generic connector code, not credentials |
| Platform documentation | Public documentation |
| System health metrics | Aggregated, anonymized |
Service Account Isolation
Service accounts enable programmatic access while maintaining isolation:
# Service account scoped to specific workspace
service_account = ServiceAccount(
name="etl_pipeline",
organization_id=org_id,
workspace_id=workspace_id, # Scoped to single workspace
permissions=["pipeline:run", "mapping:read"]
)
# API calls with service account are automatically scoped
client = DatalinxClient(api_key=service_account.api_key)
# Can only access the scoped workspace
Audit and Compliance
Audit Trail
All operations are logged with tenant context:
{
"timestamp": "2024-01-15T10:30:00Z",
"organization_id": "org_123",
"workspace_id": "ws_456",
"user_id": "user_789",
"action": "mapping.create",
"resource_id": "map_abc",
"ip_address": "192.168.1.1",
"user_agent": "Datalinx-SDK/1.0"
}
Data Residency
Organizations can specify data residency requirements:
- Region selection: Choose deployment region for data processing
- Data warehouse location: Control where transformed data is stored
- Backup locations: Specify regions for backup storage
Best Practices
For Platform Administrators
- Never share credentials between organizations
- Audit cross-tenant queries - should never happen in normal operation
- Monitor resource usage per organization for fair usage
- Rotate encryption keys on a regular schedule
For Organization Administrators
- Use workspace separation for different projects/environments
- Apply least-privilege - give users minimum required permissions
- Review access regularly - remove unused accounts
- Use service accounts for automated processes, not personal accounts
Related Documentation
- Security Architecture - Encryption and security controls
- Architecture Overview - System design principles
- User Management - Managing users and roles