← Glossary

F1 score

The harmonic mean of precision and recall — the standard single-number quality metric for entity resolution engines.

F1 = 2 × (precision × recall) / (precision + recall). Ranges from 0 (no skill) to 1 (perfect).

The harmonic mean is used (instead of arithmetic) because it penalizes imbalance: an engine with 0.95 precision but 0.20 recall has F1 ≈ 0.33, not 0.575. That matches our intuition — an engine that's "very precise but misses most matches" isn't actually good.

F1 is the headline metric Golden Suite tracks per benchmark fixture per Suite version. The nightly benchmark workflow runs against a committed Febrl dataset (500 records, 133 true match pairs) and persists F1 to surface regressions before they hit production.