← Glossary

Blocking

The first stage of entity resolution — narrowing the candidate-pair space so we only compare records that share some easy-to-compute signal.

Comparing every pair of N records is O(N²). For 50,000 records that's 1.25 billion comparisons. Blocking trades some recall for a massive speedup: only records that share a blocking signal (first 3 chars of last name, soundex code, email domain, ZIP code, etc.) are considered as candidate pairs.

Good blocking reduces comparisons by 100-1000× while still catching the matches that matter. Bad blocking either misses real duplicates (signals too tight) or compares too many pairs (signals too loose).

Modern entity-resolution libraries auto-propose blocking signals from the data shape. The proposed set is reviewable in Golden Suite's autoconfig wizard before committing.