Threshold tuning for entity resolution

Precision and recall trade against each other through the match threshold. How to tune for your data, and what each direction costs.

Every entity-resolution pipeline has at least one threshold. It is the number that separates "merge these records" from "send to review" or "keep separate." Picking it is the single highest-leverage tuning decision in an ER project, and it is almost always done wrong on the first pass.

The precision/recall tradeoff

Threshold direction	Precision	Recall	What you get
Lower (e.g. 0.70)	Drops	Rises	More merges, more false positives, more wrong-entity contamination
Higher (e.g. 0.95)	Rises	Drops	Cleaner golden records, more duplicates surviving in the output

Precision is the rate at which the matches you make are correct. Recall is the rate at which the matches you should make actually fire. They trade. There is no threshold value that maximizes both.

Move the slider on a real string pair and watch the score change:

Upload a CSV of contacts, customers, or any list. We read your file, figure out the rules ourselves, and clean out the duplicates. No setup, no tuning.

Drop CSV here

Tuning in practice

Start with the autoconfig threshold and run a sample of 500 pairs from the middle of the score distribution past a reviewer. The reviewer's judgment becomes your ground truth. Calculate precision and recall at three thresholds (0.80, 0.85, 0.90) against that ground truth and pick the one that matches the cost shape of your downstream use case.

Two questions tell you which direction to lean:

What does a false positive cost? If wrong-entity contamination breaks billing, regulatory filings, or patient safety: lean high. The cost of an extra review is small compared to the cost of a bad merge.
What does a false negative cost? If duplicate records cause downstream chaos (duplicate outreach, double-billing, broken analytics): lean low. The cost of reviewing an ambiguous merge is small compared to the cost of letting duplicates through.

Most real systems run two thresholds: a high one for auto-merge and a lower one for "send to review queue." Pairs between them surface for human judgment.

In Golden Suite

The workbench exposes the threshold as a slider with live precision/recall feedback against your sample. The postflight report shows the score distribution and where each cluster landed against the threshold. Audit history records every threshold change so you can correlate a quality regression with the day someone moved the slider.

Was this page helpful?

Edit this page on GitHub

PreviousHow to evaluate entity resolution tools NextStreaming vs batch entity resolution

Threshold tuning for entity resolution

The precision/recall tradeoff

Tuning in practice

In Golden Suite

Related