← Glossary

Phonetic matching

Matching strings that sound alike but spell differently — "Cathryn" vs "Katherine", "Schmidt" vs "Schmitt".

Phonetic algorithms encode a string into a code based on how it sounds. Two strings with the same phonetic code are treated as potential matches even if their spelling differs significantly.

Common encoders:

  • Soundex — invented in the early 1900s for the US census; codes are one letter + three digits. Crude but extremely fast.
  • Metaphone / Double Metaphone — better handling of non-English names; widely used today.
  • NYSIIS — variant of Soundex with better accuracy on Irish/Hispanic surnames.

Phonetic matching is typically used as a *blocking* signal (group records with the same code, then run finer-grained scoring within each block) rather than as a final-decision scorer. The accuracy is too coarse to merge on directly but plenty good enough to keep "Cathryn" and "Katherine" in the same candidate-pair set.