Entity Resolution Workflow
End-to-end entity resolution across multiple data sources.
Entity resolution identifies which records across different data sources refer to the same real-world entity. This guide covers the full workflow: ingesting sources, profiling each one, matching across sources, configuring survivorship rules, and reviewing the merged output.
Overview
When your data lives in multiple systems — a CRM, a billing platform, a support tool — the same customer can appear in each with slightly different details. Entity resolution links those records together and produces a single golden record per entity.
1. Ingest Sources
Load each data source into the Golden Suite via the API. Each source gets its own namespace so you can trace every field back to its origin.
curl -X POST https://api.goldensuite.dev/v1/sources \
-H "Authorization: Bearer $GOLDEN_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@crm_export.csv" \
-F "name=crm" \
-F "namespace=source_crm"
curl -X POST https://api.goldensuite.dev/v1/sources \
-H "Authorization: Bearer $GOLDEN_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@billing_export.csv" \
-F "name=billing" \
-F "namespace=source_billing"
curl -X POST https://api.goldensuite.dev/v1/sources \
-H "Authorization: Bearer $GOLDEN_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@support_export.csv" \
-F "name=support" \
-F "namespace=source_support"
2. Profile Each Source
Run GoldenCheck against each source to understand data quality, field coverage, and format differences before attempting to match.
curl -X POST https://api.goldensuite.dev/v1/profile \
-H "Authorization: Bearer $GOLDEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"source": "crm"}'
Compare the profiles side by side. Pay attention to:
- Which fields are shared across sources
- Format differences (e.g., phone number styles, date formats)
- Null rates per field — low-coverage fields are less useful for matching
3. Cross-Source Matching
Run GoldenMatch in cross-source mode. This compares records between sources rather than within a single source.
curl -X POST https://api.goldensuite.dev/v1/match \
-H "Authorization: Bearer $GOLDEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"sources": ["crm", "billing", "support"],
"mode": "cross_source",
"threshold": 0.80,
"fields": ["name", "email", "phone", "address"]
}'
Note: Cross-source resolution typically requires lower thresholds than single-source dedup because formatting differences between systems introduce more variation. A threshold of 0.75-0.85 is a good starting range.
4. Configure Survivorship
Survivorship rules determine which source's value wins for each field in the final golden record. Define a strategy per field based on your trust in each source.
| Field | Strategy | Rationale |
|---|---|---|
name | Most complete | Pick the version with the fewest nulls and most detail |
email | Most recent | Email addresses change — prefer the latest known value |
phone | Most recent | Same reasoning as email |
address | Source priority: billing > crm > support | Billing addresses are verified for payment |
created_at | Earliest | The true creation date is the first time any system saw this entity |
lifetime_value | Aggregation (sum) | Combine spend across all sources |
curl -X POST https://api.goldensuite.dev/v1/survivorship \
-H "Authorization: Bearer $GOLDEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"match_job_id": "job_abc123",
"rules": {
"name": {"strategy": "most_complete"},
"email": {"strategy": "most_recent"},
"phone": {"strategy": "most_recent"},
"address": {"strategy": "source_priority", "order": ["billing", "crm", "support"]},
"created_at": {"strategy": "earliest"},
"lifetime_value": {"strategy": "aggregate", "function": "sum"}
}
}'
5. Review and Merge
Inspect the proposed golden records. Each one shows which source records contributed and which field values were selected by the survivorship rules.
curl https://api.goldensuite.dev/v1/golden-records?job_id=job_abc123 \
-H "Authorization: Bearer $GOLDEN_API_KEY"
Review edge cases — entities with low match scores or conflicting field values — before finalizing the merge. Once satisfied, approve the batch to produce your final golden record set.