← Glossary

Fuzzy matching

Comparing two values that are similar but not exactly equal — typos, capitalization, word order, missing punctuation.

"Acme Corp" and "Acme, Inc." are not the same string, but they're probably the same company. Fuzzy matching computes a similarity score that reflects how close two values are despite differences.

Common fuzzy scorers:

  • **Levenshtein distance** — character-level edits to transform one string into the other
  • **Jaro-Winkler** — similarity favoring matching prefixes; good for names
  • Token-set ratio — order-independent comparison after tokenization
  • **Phonetic codes** like Soundex or Metaphone — match sound-alikes ("Cathryn" vs "Katherine")

The right scorer depends on the field shape. Names benefit from phonetic + Jaro-Winkler; free-text addresses prefer token-set; ID fields need exact match. Production matching engines combine multiple scorers per field, weighted by which carries the most signal.