Foundations: Identity
The Identity tab is where you manage Datalinx's identity resolution system — the engine that stitches together records from different data sources that belong to the same real-world person or entity.
The Problem Identity Resolution Solves
In most organizations, the same customer appears in multiple systems with different identifiers:
- Website analytics knows them by a device ID or cookie
- Email platform knows them by their email address
- Mobile app knows them by a mobile advertising ID
- CRM knows them by a customer ID
- Payment system knows them by a payment account number
Without identity resolution, these look like separate people. With identity resolution, they're connected into a single, unified profile.
How the Identity Graph Works
Datalinx builds an identity graph — a network of linked identifiers. The graph has four main components:
Identity Nodes
Each individual identifier is a node in the graph:
device_id: abc123email_hash: sha256(john@example.com)cookie_id: xyz789mobile_ad_id: maid_456
Identity Clusters
A cluster groups all the nodes that belong to the same person. Each cluster has a canonical ID (the primary identifier for that person) and links to all associated identifiers.
Links
Links connect two identifiers with a confidence score and a link type:
| Link Type | Meaning | Example |
|---|---|---|
| Deterministic | Confirmed exact match | Same email used to log in on two devices |
| Probabilistic | Likely the same person based on patterns | Same device ID and similar timestamps |
| Inferred | Indirect connection through a chain of links | Device A → email → Device B |
Confidence Scores
Every link has a confidence score (0 to 1) indicating how certain the system is that two identifiers belong to the same person. Higher scores mean stronger matches.
What You'll See in the Identity Tab
Identity Statistics
High-level metrics about your identity graph:
- Total identity nodes
- Total clusters (unique people)
- Average identifiers per person
- Link type distribution (deterministic vs. probabilistic)
Canonical ID Tracking
View and manage the primary identifier assigned to each cluster. The canonical ID is typically the most reliable identifier (e.g., an email address or customer ID).
Cluster Analysis
Explore individual clusters to see:
- All identifiers linked to a person
- How identifiers were connected (link type and confidence)
- Source systems for each identifier
- Timeline of when identifiers were first seen
Sample Clusters
Browse example clusters to understand how the identity graph is working. This is useful for:
- Verifying that stitching is correct
- Finding cases where identifiers were incorrectly merged
- Understanding the quality of your identity resolution
Supported Identity Types
Datalinx supports these identity types out of the box:
| Type | Description | Common Source |
|---|---|---|
email_hash | Hashed email addresses | CRM, account systems, email platforms |
device_id | Browser or app device identifiers | Web analytics, mobile SDKs |
cookie_id | Browser cookies | Website tracking |
maID | Mobile advertising IDs (IDFA, GAID) | Mobile apps, ad networks |
Additional identity types can be configured for your specific use case.
Tagging Identity Fields
To feed data into the identity graph, you tag identity columns in your source data using decorators. This is done in the Transform (Mapping) tab or through the Pipeline Agent:
column_name → IDENTITY(device_id) -- Tags as a device identifier
column_name → IDENTITY(email_hash) -- Tags as an email identifier
column_name → IDENTITY(maID) -- Tags as a mobile ad ID
When your pipeline runs, Datalinx automatically:
- Detects the decorated identity columns
- Extracts the identifier values
- Creates nodes in the identity graph
- Links identifiers that appear in the same record (deterministic match)
- Runs probabilistic matching across records
- Groups linked identifiers into clusters
A Practical Example
Imagine you have three source tables:
Web Events (from your website analytics):
| device_id | page_url | timestamp |
|---|---|---|
| dev_123 | /pricing | 2024-01-15 |
Email Signups (from your marketing platform):
| device_id | campaign | |
|---|---|---|
| john@example.com | dev_123 | spring_sale |
App Users (from your mobile app):
| user_id | mobile_ad_id | |
|---|---|---|
| usr_456 | john@example.com | maid_789 |
After identity resolution, Datalinx connects:
dev_123 ←(deterministic)→ john@example.com ←(deterministic)→ maid_789
↓
usr_456
All four identifiers are now in the same cluster. You can see that the anonymous website visit (dev_123 on /pricing) was the same person who later signed up via email and uses the mobile app.
Tips
- Start with deterministic matches (same email, same customer ID) — they're the most reliable
- Tag identity fields early in your mapping process — this ensures the graph builds as your pipeline runs
- Review sample clusters regularly to check for incorrect merges (two different people linked together)
- The more identity types you tag, the more complete your customer view becomes
- Identity resolution is especially powerful for marketing attribution — it connects anonymous pre-signup activity to known customers