← Glossary

Precision and recall (in matching)

Two complementary metrics: precision measures how many of your merges are correct; recall measures how many real duplicates you actually merged.

In an entity-resolution context:

  • Precision = (true matches) / (true matches + false matches). High precision = when the engine says "these are the same," it's right.
  • Recall = (true matches) / (true matches + missed matches). High recall = the engine finds the duplicates that exist.

The trade-off: raising the match threshold improves precision (fewer false merges) but hurts recall (more real duplicates missed). The right operating point depends on the cost of each error.

The combined metric is the **F1 score** — the harmonic mean of precision and recall. Used as the single number to compare engine quality across runs.