Skip to main content

Foundations: Identity

The Identity tab is where you manage Datalinx's identity resolution system — the engine that stitches together records from different data sources that belong to the same real-world person or entity.

The Problem Identity Resolution Solves

In most organizations, the same customer appears in multiple systems with different identifiers:

  • Website analytics knows them by a device ID or cookie
  • Email platform knows them by their email address
  • Mobile app knows them by a mobile advertising ID
  • CRM knows them by a customer ID
  • Payment system knows them by a payment account number

Without identity resolution, these look like separate people. With identity resolution, they're connected into a single, unified profile.

How the Identity Graph Works

Datalinx builds an identity graph — a network of linked identifiers. The graph has four main components:

Identity Nodes

Each individual identifier is a node in the graph:

  • device_id: abc123
  • email_hash: sha256(john@example.com)
  • cookie_id: xyz789
  • mobile_ad_id: maid_456

Identity Clusters

A cluster groups all the nodes that belong to the same person. Each cluster has a canonical ID (the primary identifier for that person) and links to all associated identifiers.

Links connect two identifiers with a confidence score and a link type:

Link TypeMeaningExample
DeterministicConfirmed exact matchSame email used to log in on two devices
ProbabilisticLikely the same person based on patternsSame device ID and similar timestamps
InferredIndirect connection through a chain of linksDevice A → email → Device B

Confidence Scores

Every link has a confidence score (0 to 1) indicating how certain the system is that two identifiers belong to the same person. Higher scores mean stronger matches.

What You'll See in the Identity Tab

Identity Statistics

High-level metrics about your identity graph:

  • Total identity nodes
  • Total clusters (unique people)
  • Average identifiers per person
  • Link type distribution (deterministic vs. probabilistic)

Canonical ID Tracking

View and manage the primary identifier assigned to each cluster. The canonical ID is typically the most reliable identifier (e.g., an email address or customer ID).

Cluster Analysis

Explore individual clusters to see:

  • All identifiers linked to a person
  • How identifiers were connected (link type and confidence)
  • Source systems for each identifier
  • Timeline of when identifiers were first seen

Sample Clusters

Browse example clusters to understand how the identity graph is working. This is useful for:

  • Verifying that stitching is correct
  • Finding cases where identifiers were incorrectly merged
  • Understanding the quality of your identity resolution

Supported Identity Types

Datalinx supports these identity types out of the box:

TypeDescriptionCommon Source
email_hashHashed email addressesCRM, account systems, email platforms
device_idBrowser or app device identifiersWeb analytics, mobile SDKs
cookie_idBrowser cookiesWebsite tracking
maIDMobile advertising IDs (IDFA, GAID)Mobile apps, ad networks

Additional identity types can be configured for your specific use case.

Tagging Identity Fields

To feed data into the identity graph, you tag identity columns in your source data using decorators. This is done in the Transform (Mapping) tab or through the Pipeline Agent:

column_name → IDENTITY(device_id)     -- Tags as a device identifier
column_name → IDENTITY(email_hash) -- Tags as an email identifier
column_name → IDENTITY(maID) -- Tags as a mobile ad ID

When your pipeline runs, Datalinx automatically:

  1. Detects the decorated identity columns
  2. Extracts the identifier values
  3. Creates nodes in the identity graph
  4. Links identifiers that appear in the same record (deterministic match)
  5. Runs probabilistic matching across records
  6. Groups linked identifiers into clusters

A Practical Example

Imagine you have three source tables:

Web Events (from your website analytics):

device_idpage_urltimestamp
dev_123/pricing2024-01-15

Email Signups (from your marketing platform):

emaildevice_idcampaign
john@example.comdev_123spring_sale

App Users (from your mobile app):

user_idemailmobile_ad_id
usr_456john@example.commaid_789

After identity resolution, Datalinx connects:

dev_123 ←(deterministic)→ john@example.com ←(deterministic)→ maid_789

usr_456

All four identifiers are now in the same cluster. You can see that the anonymous website visit (dev_123 on /pricing) was the same person who later signed up via email and uses the mobile app.

Tips

  • Start with deterministic matches (same email, same customer ID) — they're the most reliable
  • Tag identity fields early in your mapping process — this ensures the graph builds as your pipeline runs
  • Review sample clusters regularly to check for incorrect merges (two different people linked together)
  • The more identity types you tag, the more complete your customer view becomes
  • Identity resolution is especially powerful for marketing attribution — it connects anonymous pre-signup activity to known customers