Levenshtein distance

A similarity metric that counts the minimum single-character edits (insert, delete, substitute) needed to transform one string into another.

Levenshtein distance is the foundational edit-distance metric in fuzzy matching. "Acme" → "Acne" is distance 1 (one substitution). "Acme Corp" → "Acmee Corp" is distance 1 (one insertion).

Edit distance scales as O(n·m) for strings of length n and m, which is fine for short fields (names, addresses) and prohibitively slow for long text. Production matchers normalize the raw distance to a 0-1 similarity (`1 - distance/max(n,m)`).

For most MDM use cases, Levenshtein is a backup scorer rather than the primary. Jaro-Winkler does better on names because it weights matching prefixes more heavily; tokenization-based scorers do better on multi-word fields where order may vary.