Quickstart

Run your first dedup in under 5 minutes.

Your First Dedup

Given a CSV with potential duplicates, GoldenMatch scores every pair of records and groups matches into clusters.

goldenmatch demo
import goldenmatch
result = goldenmatch.dedupe("customers.csv", threshold=0.85)
print(f"Found {result.cluster_count} clusters from {result.record_count} records")

What Just Happened

  1. Blocking — GoldenMatch grouped similar records to avoid comparing every pair
  2. Scoring — Each candidate pair was scored across multiple fields (name, address, email, etc.)
  3. Clustering — Pairs above the threshold were grouped into clusters of matching records

Next Steps