Entity Resolution Workflow

End-to-end entity resolution across multiple data sources.

Entity resolution identifies which records across different data sources refer to the same real-world entity. This guide covers the full workflow: ingesting sources, profiling each one, matching across sources, configuring survivorship rules, and reviewing the merged output.

Overview

When your data lives in multiple systems — a CRM, a billing platform, a support tool — the same customer can appear in each with slightly different details. Entity resolution links those records together and produces a single golden record per entity.

1. Ingest Sources

Load each data source into the Golden Suite via the API. Each source gets its own namespace so you can trace every field back to its origin.

curl -X POST https://api.goldensuite.dev/v1/sources \
  -H "Authorization: Bearer $GOLDEN_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@crm_export.csv" \
  -F "name=crm" \
  -F "namespace=source_crm"
curl -X POST https://api.goldensuite.dev/v1/sources \
  -H "Authorization: Bearer $GOLDEN_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@billing_export.csv" \
  -F "name=billing" \
  -F "namespace=source_billing"
curl -X POST https://api.goldensuite.dev/v1/sources \
  -H "Authorization: Bearer $GOLDEN_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@support_export.csv" \
  -F "name=support" \
  -F "namespace=source_support"

2. Profile Each Source

Run GoldenCheck against each source to understand data quality, field coverage, and format differences before attempting to match.

curl -X POST https://api.goldensuite.dev/v1/profile \
  -H "Authorization: Bearer $GOLDEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"source": "crm"}'

Compare the profiles side by side. Pay attention to:

  • Which fields are shared across sources
  • Format differences (e.g., phone number styles, date formats)
  • Null rates per field — low-coverage fields are less useful for matching

3. Cross-Source Matching

Run GoldenMatch in cross-source mode. This compares records between sources rather than within a single source.

curl -X POST https://api.goldensuite.dev/v1/match \
  -H "Authorization: Bearer $GOLDEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": ["crm", "billing", "support"],
    "mode": "cross_source",
    "threshold": 0.80,
    "fields": ["name", "email", "phone", "address"]
  }'

Note: Cross-source resolution typically requires lower thresholds than single-source dedup because formatting differences between systems introduce more variation. A threshold of 0.75-0.85 is a good starting range.

4. Configure Survivorship

Survivorship rules determine which source's value wins for each field in the final golden record. Define a strategy per field based on your trust in each source.

FieldStrategyRationale
nameMost completePick the version with the fewest nulls and most detail
emailMost recentEmail addresses change — prefer the latest known value
phoneMost recentSame reasoning as email
addressSource priority: billing > crm > supportBilling addresses are verified for payment
created_atEarliestThe true creation date is the first time any system saw this entity
lifetime_valueAggregation (sum)Combine spend across all sources
curl -X POST https://api.goldensuite.dev/v1/survivorship \
  -H "Authorization: Bearer $GOLDEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "match_job_id": "job_abc123",
    "rules": {
      "name": {"strategy": "most_complete"},
      "email": {"strategy": "most_recent"},
      "phone": {"strategy": "most_recent"},
      "address": {"strategy": "source_priority", "order": ["billing", "crm", "support"]},
      "created_at": {"strategy": "earliest"},
      "lifetime_value": {"strategy": "aggregate", "function": "sum"}
    }
  }'

5. Review and Merge

Inspect the proposed golden records. Each one shows which source records contributed and which field values were selected by the survivorship rules.

curl https://api.goldensuite.dev/v1/golden-records?job_id=job_abc123 \
  -H "Authorization: Bearer $GOLDEN_API_KEY"

Review edge cases — entities with low match scores or conflicting field values — before finalizing the merge. Once satisfied, approve the batch to produce your final golden record set.