Patient matching in healthcare
Resolve patient identities across EHR + lab + claims data. Compliance-aware playbook.
Compliance disclaimer: Patient matching touches PHI. Golden Suite is not currently HIPAA-attested. The platform's controls are SOC2-aligned and the audit chain is cryptographic, but full HIPAA BAA + attestation is on the roadmap. Until then, use Golden Suite for non-PHI use cases (research datasets with stripped identifiers, member-services CRM, supplier MDM) or contact
ben@bensevern.devto discuss BAA timeline before putting real PHI through the platform.
With that out of the way: patient matching (Master Patient Index / Enterprise MPI) is one of the highest-impact MDM use cases in any field. A duplicate patient record can mean missed allergies, redundant testing, or worse. This guide walks the technical approach; productionizing it requires the compliance gates above.
What patient matching solves
A typical hospital has the same patient under multiple medical record numbers (MRNs):
- Different departments created records independently
- Patient came in under a different name (married, hyphenated, transliterated)
- Patient came in unconscious without ID; a new MRN got assigned
- Inter-system mismatch (Epic vs Cerner, EHR vs lab vs imaging)
Result: clinical decisions made against incomplete history. Resolving this is what an MPI does.
Setup sequence
Identifier sources
- MRN — internal medical record number; unique per source system but not globally
- SSN — strongest deterministic identifier when available; often missing or partial
- Insurance member ID — useful but changes when patient switches plans
- DOB — high-signal blocking field (year + month) but not unique
- Full legal name — fuzzy match; many patients have transliterated, hyphenated, or married names
- Address — moderate signal; patients move
- Phone — moderate signal
- Government ID number (passport, driver's license) — strong when available
Matching strategy
Patient matching benefits from the probabilistic matching approach (Fellegi-Sunter framework, in academic terms) more than other domains do. The reason: every identifier has known error rates from the literature.
Use a weighted scorer per field:
- DOB — exact match strong signal (year + month + day); year-only fallback if month/day missing
- First name — Jaro-Winkler with prefix bias (handles "Robert" / "Bob" if you also have a nickname table)
- Last name — Jaro-Winkler primary; Soundex backup for transliterated names
- SSN — exact match when full; partial-match (last 4) at lower confidence
- Sex — exact match
- Address (zip) — exact-match blocking signal
The combined per-pair score determines: auto-merge, ambiguous, or reject. Healthcare's tolerance for false-merges is much lower than other domains — set the ambiguous-band wider (e.g., 0.85-0.97 instead of 0.85-0.92).
Stewardship is critical
In healthcare MDM, the review queue is the product. You will never auto-merge 100%; the cost of a false-merge (combining two different patients' records) is high enough that every borderline case needs human review.
Best practice:
- Trained Health Information Management (HIM) staff own the review queue
- Audit trail captures every approve/split/merge decision with the reviewer's identifier
- Reviewer decisions feed back into scorer-weight tuning (gradually raising precision)
Golden Suite's review queue and audit chain are designed for this workflow. The cryptographic chain matters here — HIM regulators sometimes ask for proof of audit-log integrity.
Privacy-preserving cross-organization matching (PPRL)
The big healthcare use case Golden Suite doesn't yet ship is cross-organization patient matching. Imagine a research consortium where multiple hospitals want to match patients without exchanging raw identifiers — that's PPRL using CLK encoding.
PPRL is on the roadmap (Phase 11+) and we'll quote build-along delivery dates for Enterprise customers who specifically need it. The architecture proposal lives at /enterprise — Bloom-filter-encoded CLKs exchanged through a stateless linkage broker, with the salt shared between organizations and not known to us.
Compliance landscape today
| Control | Status |
|---|---|
| Encryption at rest | ✓ envelope-encrypted credentials (per-org DEKs, KEK in env) |
| Encryption in transit | ✓ HTTPS everywhere |
| Audit log integrity | ✓ Cryptographic chain (per-org SHA-256) |
| Per-tenant isolation | ✓ Per-org row-level scoping on every query |
| HIPAA BAA | ☐ Not yet — contact ben@bensevern.dev for timeline |
| SOC2 Type 2 | In progress — attestation targeted end of 2026 |
| HITRUST | Not on roadmap |
| Data residency (US-only / EU-only) | Available on Enterprise tier |
Common pitfalls
- Auto-merging too aggressively. False-merges combining different patients are the worst possible failure mode in healthcare MDM. Set the auto-merge threshold high (0.97+) and accept that 5-15% of clusters need human review.
- Treating MRN as cross-system unique. It isn't. Each source system assigns its own MRN.
- Ignoring transliteration patterns. Patient names in immigrant populations vary widely across documents (Latin script transliterations of Cyrillic, Arabic, Chinese names). Add Soundex / Metaphone as fallback blocking signals.
- No process for unmerging. Sometimes auto-merges turn out to be wrong months later. Golden Suite supports split operations on existing clusters; bake the unmerge workflow into your steward training.
- Forgetting deceased status. Deceased patient records should be preserved (clinical history is still relevant); never merged with a living patient even on identifier match.
Next steps
- /enterprise — talk to us about BAA + HIPAA timeline before putting real PHI through the platform
- Concept: lineage — required reading for the audit-chain story
- /glossary/pprl — the cross-organization matching architecture
- /glossary/precision-and-recall — the tradeoff framing for healthcare's tight precision requirement