Deduplication vs entity resolution vs record linkage vs MDM
Four overlapping terms with distinct meanings. How dedup, ER, record linkage, and MDM differ in goal, inputs, outputs, and where each fits.
The four terms overlap heavily, and people use them interchangeably in vendor docs, job postings, and conference talks. They are not the same thing.
The four terms
| Term | Goal | Input | Output |
|---|---|---|---|
| Deduplication | Remove redundant copies | Single dataset | Same dataset, fewer rows |
| Entity resolution | Identify which records refer to the same real-world entity | Single dataset (often dirty) | Golden records with lineage to inputs |
| Record linkage | Match records across datasets | Two or more datasets | Cross-dataset match keys or a merged dataset |
| Master Data Management (MDM) | Operate a system of record for canonical entities over time | Ongoing feeds from multiple systems | Continuously curated golden source |
Where the lines blur
Deduplication is the subset of entity resolution where the only input is one dataset and the only output is a smaller version of that dataset. Record linkage is entity resolution across two or more datasets. The algorithms are the same in all three cases: blocking, scoring, clustering. MDM is the operational and governance layer on top of an ER pipeline. It adds versioning, survivorship rules, stewardship queues, and audit trails. The difference between ER and MDM is scope, not a fundamentally different set of algorithms.
Why this matters for scoping
Vendors and job postings use these terms loosely. Before evaluating any tool, write out which of the four you actually need. A "deduplication tool" rarely handles cross-system linkage or stewardship workflows. An "MDM platform" is overkill for a one-time CSV cleanup. Scope first, then shop.
In Golden Suite
Golden Suite covers entity resolution and record linkage as the core engine, with workbench primitives (review queue, survivorship rules, audit log) that grow into MDM territory as a deployment matures. The free tier handles dedup and ER cleanly. MDM-grade governance (versioned survivorship rules, audit chain, stewardship workflows) lands at the Pro and Enterprise tiers.