Fuzzy matching
Comparing two values that are similar but not exactly equal — typos, capitalization, word order, missing punctuation.
"Acme Corp" and "Acme, Inc." are not the same string, but they're probably the same company. Fuzzy matching computes a similarity score that reflects how close two values are despite differences.
Common fuzzy scorers:
- **Levenshtein distance** — character-level edits to transform one string into the other
- **Jaro-Winkler** — similarity favoring matching prefixes; good for names
- Token-set ratio — order-independent comparison after tokenization
- **Phonetic codes** like Soundex or Metaphone — match sound-alikes ("Cathryn" vs "Katherine")
The right scorer depends on the field shape. Names benefit from phonetic + Jaro-Winkler; free-text addresses prefer token-set; ID fields need exact match. Production matching engines combine multiple scorers per field, weighted by which carries the most signal.