GoldenMatch

GoldenMatch performs probabilistic record linkage with configurable field weights and blocking strategies. Covers threshold tuning and cluster output.

GoldenMatch performs probabilistic record linkage — it compares records across multiple fields, scores similarity, and groups matches into clusters.

Note: GoldenMatch is the matching core of the funnel. When the hosted Workbench dispatches a dedup against your connected sources, this is what runs; the ambiguous clusters it can't auto-merge land in your review queue. You can also pip install goldenmatch and run it standalone — same engine, no SaaS.

Basic Usage

import goldenmatch

result = goldenmatch.dedupe("data.csv", threshold=0.85)
print(result.clusters)

Configuration

Field Weights

Control how much each field contributes to the overall match score:

result = goldenmatch.dedupe(
    "data.csv",
    threshold=0.85,
    weights={"name": 0.4, "address": 0.3, "email": 0.2, "phone": 0.1}
)

Blocking Strategy

Blocking reduces the number of comparisons by only comparing records that share a blocking key:

result = goldenmatch.dedupe(
    "data.csv",
    threshold=0.85,
    blocking_keys=["zip_code", "first_letter_last_name"]
)

Interactive Playground

Tune thresholds and field weights visually:

Upload a CSV of contacts, customers, or any list. We read your file, figure out the rules ourselves, and clean out the duplicates. No setup, no tuning.

Drop CSV here

Output Format

FieldTypeDescription
cluster_idstringUnique cluster identifier
record_idstringOriginal record identifier
scorefloatMatch confidence (0.0 - 1.0)
cluster_sizeintNumber of records in the cluster

Note: Records below the threshold are assigned to singleton clusters.

Was this page helpful?
Edit this page on GitHub