Patient matching in healthcare

Resolve patient identities across EHR + lab + claims data. Compliance-aware playbook.

Compliance disclaimer: Patient matching touches PHI. Golden Suite is not currently HIPAA-attested. The platform's controls are SOC2-aligned and the audit chain is cryptographic, but full HIPAA BAA + attestation is on the roadmap. Until then, use Golden Suite for non-PHI use cases (research datasets with stripped identifiers, member-services CRM, supplier MDM) or contact ben@bensevern.dev to discuss BAA timeline before putting real PHI through the platform.

With that out of the way: patient matching (Master Patient Index / Enterprise MPI) is one of the highest-impact MDM use cases in any field. A duplicate patient record can mean missed allergies, redundant testing, or worse. This guide walks the technical approach; productionizing it requires the compliance gates above.

What patient matching solves

A typical hospital has the same patient under multiple medical record numbers (MRNs):

  • Different departments created records independently
  • Patient came in under a different name (married, hyphenated, transliterated)
  • Patient came in unconscious without ID; a new MRN got assigned
  • Inter-system mismatch (Epic vs Cerner, EHR vs lab vs imaging)

Result: clinical decisions made against incomplete history. Resolving this is what an MPI does.

Setup sequence

Identifier sources

  • MRN — internal medical record number; unique per source system but not globally
  • SSN — strongest deterministic identifier when available; often missing or partial
  • Insurance member ID — useful but changes when patient switches plans
  • DOB — high-signal blocking field (year + month) but not unique
  • Full legal name — fuzzy match; many patients have transliterated, hyphenated, or married names
  • Address — moderate signal; patients move
  • Phone — moderate signal
  • Government ID number (passport, driver's license) — strong when available

Matching strategy

Patient matching benefits from the probabilistic matching approach (Fellegi-Sunter framework, in academic terms) more than other domains do. The reason: every identifier has known error rates from the literature.

Use a weighted scorer per field:

  • DOB — exact match strong signal (year + month + day); year-only fallback if month/day missing
  • First name — Jaro-Winkler with prefix bias (handles "Robert" / "Bob" if you also have a nickname table)
  • Last name — Jaro-Winkler primary; Soundex backup for transliterated names
  • SSN — exact match when full; partial-match (last 4) at lower confidence
  • Sex — exact match
  • Address (zip) — exact-match blocking signal

The combined per-pair score determines: auto-merge, ambiguous, or reject. Healthcare's tolerance for false-merges is much lower than other domains — set the ambiguous-band wider (e.g., 0.85-0.97 instead of 0.85-0.92).

Stewardship is critical

In healthcare MDM, the review queue is the product. You will never auto-merge 100%; the cost of a false-merge (combining two different patients' records) is high enough that every borderline case needs human review.

Best practice:

  • Trained Health Information Management (HIM) staff own the review queue
  • Audit trail captures every approve/split/merge decision with the reviewer's identifier
  • Reviewer decisions feed back into scorer-weight tuning (gradually raising precision)

Golden Suite's review queue and audit chain are designed for this workflow. The cryptographic chain matters here — HIM regulators sometimes ask for proof of audit-log integrity.

Privacy-preserving cross-organization matching (PPRL)

The big healthcare use case Golden Suite doesn't yet ship is cross-organization patient matching. Imagine a research consortium where multiple hospitals want to match patients without exchanging raw identifiers — that's PPRL using CLK encoding.

PPRL is on the roadmap (Phase 11+) and we'll quote build-along delivery dates for Enterprise customers who specifically need it. The architecture proposal lives at /enterprise — Bloom-filter-encoded CLKs exchanged through a stateless linkage broker, with the salt shared between organizations and not known to us.

Compliance landscape today

ControlStatus
Encryption at rest✓ envelope-encrypted credentials (per-org DEKs, KEK in env)
Encryption in transit✓ HTTPS everywhere
Audit log integrity✓ Cryptographic chain (per-org SHA-256)
Per-tenant isolation✓ Per-org row-level scoping on every query
HIPAA BAA☐ Not yet — contact ben@bensevern.dev for timeline
SOC2 Type 2In progress — attestation targeted end of 2026
HITRUSTNot on roadmap
Data residency (US-only / EU-only)Available on Enterprise tier

Common pitfalls

  • Auto-merging too aggressively. False-merges combining different patients are the worst possible failure mode in healthcare MDM. Set the auto-merge threshold high (0.97+) and accept that 5-15% of clusters need human review.
  • Treating MRN as cross-system unique. It isn't. Each source system assigns its own MRN.
  • Ignoring transliteration patterns. Patient names in immigrant populations vary widely across documents (Latin script transliterations of Cyrillic, Arabic, Chinese names). Add Soundex / Metaphone as fallback blocking signals.
  • No process for unmerging. Sometimes auto-merges turn out to be wrong months later. Golden Suite supports split operations on existing clusters; bake the unmerge workflow into your steward training.
  • Forgetting deceased status. Deceased patient records should be preserved (clinical history is still relevant); never merged with a living patient even on identifier match.

Next steps