Quickstart
Run your first dedup in under 5 minutes.
Your First Dedup
Given a CSV with potential duplicates, GoldenMatch scores every pair of records and groups matches into clusters.
goldenmatch demo
import goldenmatch
result = goldenmatch.dedupe("customers.csv", threshold=0.85)
print(f"Found {result.cluster_count} clusters from {result.record_count} records")What Just Happened
- Blocking — GoldenMatch grouped similar records to avoid comparing every pair
- Scoring — Each candidate pair was scored across multiple fields (name, address, email, etc.)
- Clustering — Pairs above the threshold were grouped into clusters of matching records
Next Steps
- Read the GoldenMatch reference for configuration options
- Try the interactive playground to tune thresholds visually
- Follow the Dirty CSV to Golden Records guide for a full walkthrough