Foundations: Identity

The Identity tab is where you manage Datalinx's identity resolution system — the engine that stitches together records from different data sources that belong to the same real-world person or entity.

The Problem Identity Resolution Solves

In most organizations, the same customer appears in multiple systems with different identifiers:

Website analytics knows them by a device ID or cookie
Email platform knows them by their email address
Mobile app knows them by a mobile advertising ID
CRM knows them by a customer ID
Payment system knows them by a payment account number

Without identity resolution, these look like separate people. With identity resolution, they're connected into a single, unified profile.

How the Identity Graph Works

Datalinx builds an identity graph — a network of linked identifiers. The graph has four main components:

Identity Nodes

Each individual identifier is a node in the graph:

device_id: abc123
email_hash: sha256(john@example.com)
cookie_id: xyz789
mobile_ad_id: maid_456

Identity Clusters

A cluster groups all the nodes that belong to the same person. Each cluster has a canonical ID (the primary identifier for that person) and links to all associated identifiers.

Links

Links connect two identifiers with a confidence score and a link type:

Link Type	Meaning	Example
Deterministic	Confirmed exact match	Same email used to log in on two devices
Probabilistic	Likely the same person based on patterns	Same device ID and similar timestamps
Inferred	Indirect connection through a chain of links	Device A → email → Device B

Confidence Scores

Every link has a confidence score (0 to 1) indicating how certain the system is that two identifiers belong to the same person. Higher scores mean stronger matches.

What You'll See in the Identity Tab

Identity Statistics

High-level metrics about your identity graph:

Total identity nodes
Total clusters (unique people)
Average identifiers per person
Link type distribution (deterministic vs. probabilistic)

Canonical ID Tracking

View and manage the primary identifier assigned to each cluster. The canonical ID is typically the most reliable identifier (e.g., an email address or customer ID).

Cluster Analysis

Explore individual clusters to see:

All identifiers linked to a person
How identifiers were connected (link type and confidence)
Source systems for each identifier
Timeline of when identifiers were first seen

Sample Clusters

Browse example clusters to understand how the identity graph is working. This is useful for:

Verifying that stitching is correct
Finding cases where identifiers were incorrectly merged
Understanding the quality of your identity resolution

Supported Identity Types

Datalinx supports these identity types out of the box:

Type	Description	Common Source
`email_hash`	Hashed email addresses	CRM, account systems, email platforms
`device_id`	Browser or app device identifiers	Web analytics, mobile SDKs
`cookie_id`	Browser cookies	Website tracking
`maID`	Mobile advertising IDs (IDFA, GAID)	Mobile apps, ad networks

Additional identity types can be configured for your specific use case.

Tagging Identity Fields

To feed data into the identity graph, you tag identity columns in your source data using decorators. This is done in the Transform (Mapping) tab or through the Pipeline Agent:

column_name → IDENTITY(device_id)     -- Tags as a device identifier
column_name → IDENTITY(email_hash)    -- Tags as an email identifier
column_name → IDENTITY(maID)          -- Tags as a mobile ad ID

When your pipeline runs, Datalinx automatically:

Detects the decorated identity columns
Extracts the identifier values
Creates nodes in the identity graph
Links identifiers that appear in the same record (deterministic match)
Runs probabilistic matching across records
Groups linked identifiers into clusters

A Practical Example

Imagine you have three source tables:

Web Events (from your website analytics):

device_id	page_url	timestamp
dev_123	/pricing	2024-01-15

Email Signups (from your marketing platform):

email	device_id	campaign
john@example.com	dev_123	spring_sale

App Users (from your mobile app):

user_id	email	mobile_ad_id
usr_456	john@example.com	maid_789

After identity resolution, Datalinx connects:

dev_123 ←(deterministic)→ john@example.com ←(deterministic)→ maid_789
                                ↓
                            usr_456

All four identifiers are now in the same cluster. You can see that the anonymous website visit (dev_123 on /pricing) was the same person who later signed up via email and uses the mobile app.

Tips

Start with deterministic matches (same email, same customer ID) — they're the most reliable
Tag identity fields early in your mapping process — this ensures the graph builds as your pipeline runs
Review sample clusters regularly to check for incorrect merges (two different people linked together)
The more identity types you tag, the more complete your customer view becomes
Identity resolution is especially powerful for marketing attribution — it connects anonymous pre-signup activity to known customers

The Problem Identity Resolution Solves​

How the Identity Graph Works​

Identity Nodes​

Identity Clusters​

Links​

Confidence Scores​

What You'll See in the Identity Tab​

Identity Statistics​

Canonical ID Tracking​

Cluster Analysis​

Sample Clusters​

Supported Identity Types​

Tagging Identity Fields​

A Practical Example​

Tips​