Jaro-Winkler similarity

A string similarity score (0 to 1) that favors matches with identical prefixes. Well-suited to person and company names where typos rarely occur at the start.

Jaro-Winkler is built on Jaro similarity, which counts matching characters within a sliding window plus the number of transpositions. The Winkler modification boosts the score when the strings share an identical prefix (up to 4 characters), under the empirical observation that name typos tend to happen in the middle/end, not the start.

The score ranges 0 (no similarity) to 1 (identical). 0.85+ is typically a confident match for names; 0.7-0.85 is the ambiguous middle. Exact thresholds depend on dataset and language.

Jaro-Winkler is the default scorer for name fields in most modern matching libraries because the prefix bias matches how real names get misspelled. It's a poor choice for address or free-text fields — use token-set or fuzzy-substring scorers there.