Fuzzy matching

Comparing two values that are similar but not exactly equal: typos, capitalization, word order, missing punctuation. The workhorse of probabilistic matching.

"Acme Corp" and "Acme, Inc." are not the same string, but they're probably the same company. Fuzzy matching computes a similarity score that reflects how close two values are despite differences.

Common fuzzy scorers:

**Levenshtein distance** — character-level edits to transform one string into the other
**Jaro-Winkler** — similarity favoring matching prefixes; good for names
Token-set ratio — order-independent comparison after tokenization
**Phonetic codes** like Soundex or Metaphone — match sound-alikes ("Cathryn" vs "Katherine")

The right scorer depends on the field shape. Names benefit from phonetic + Jaro-Winkler; free-text addresses prefer token-set; ID fields need exact match. Production matching engines combine multiple scorers per field, weighted by which carries the most signal.