GoldenMatch
GoldenMatch performs probabilistic record linkage with configurable field weights and blocking strategies. Covers threshold tuning and cluster output.
GoldenMatch performs probabilistic record linkage — it compares records across multiple fields, scores similarity, and groups matches into clusters.
Note: GoldenMatch is the matching core of the funnel. When the hosted Workbench dispatches a dedup against your connected sources, this is what runs; the ambiguous clusters it can't auto-merge land in your review queue. You can also
pip install goldenmatchand run it standalone — same engine, no SaaS.
Basic Usage
import goldenmatch
result = goldenmatch.dedupe("data.csv", threshold=0.85)
print(result.clusters)
Configuration
Field Weights
Control how much each field contributes to the overall match score:
result = goldenmatch.dedupe(
"data.csv",
threshold=0.85,
weights={"name": 0.4, "address": 0.3, "email": 0.2, "phone": 0.1}
)
Blocking Strategy
Blocking reduces the number of comparisons by only comparing records that share a blocking key:
result = goldenmatch.dedupe(
"data.csv",
threshold=0.85,
blocking_keys=["zip_code", "first_letter_last_name"]
)
Interactive Playground
Tune thresholds and field weights visually:
Upload a CSV of contacts, customers, or any list. We read your file, figure out the rules ourselves, and clean out the duplicates. No setup, no tuning.
Drop CSV here
Output Format
| Field | Type | Description |
|---|---|---|
cluster_id | string | Unique cluster identifier |
record_id | string | Original record identifier |
score | float | Match confidence (0.0 - 1.0) |
cluster_size | int | Number of records in the cluster |
Note: Records below the threshold are assigned to singleton clusters.