Quickstart
Walk the full funnel: connect a source, run autoconfig + match, clear the review queue, push to a destination. Five minutes end-to-end.
This quickstart walks the full funnel on the hosted SaaS — source in, golden records out. The whole loop is ~5 minutes on a small dataset.
1. Connect a source
Head to Sources and pick a connector: CSV upload, Postgres connection string, or any of the 22 SaaS API connectors (HubSpot, Salesforce, Stripe, Pipedrive, Klaviyo, Shopify, etc.). For SaaS connectors, paste the API key or click through OAuth; credentials are encrypted server-side.
Tip: If two sources share an
entity_type(e.g.person), the matcher pools them automatically — that's the whole point of running this between HubSpot and Salesforce.
2. Let autoconfig propose match rules
Open a workbench notebook against your source(s). goldenmatch 1.18's multi-wave autoconfig
inspects your schema and proposes:
- Which columns to block on (high-cardinality identifiers)
- Per-field scorers (name → token Jaccard, email → exact, address → token-sort, etc.)
- A clustering threshold tuned to the data shape
- Survivorship rules for which source wins on each field
You can override anything, but the defaults are usually good for a first useful run. No DSL to learn before you see a result.
3. Run the matcher
import goldenmatch
result = goldenmatch.dedupe("customers.csv")
print(f"Found {result.cluster_count} clusters from {result.record_count} records")
print(f"Ambiguous (review-queue): {len(result.ambiguous_merges)}")The dispatch returns three things:
- Auto-merged clusters — high-confidence matches that became golden records directly
- Ambiguous clusters — ones the scorer wasn't sure about; these land in your review queue
- A postflight report — cluster IDs, demoted scorers, and confidence histograms
4. Clear the review queue
The Review queue shows every ambiguous cluster the matcher punted on. Each decision (approve / split / merge) is recorded in the audit trail and propagates to the next destination push, so your golden records reflect the call you made.
Roadmap: decisions will additionally feed the scorer's field-rules layer so future runs auto-merge similar clusters without you re-deciding. The wiring is tracked in issue #135 — right now decisions stop at the audit log + the next export.
5. Push to a destination
Configure a destination — warehouse table or cloud file drop.
| Family | Targets |
|---|---|
| Warehouses | Postgres · MySQL · Snowflake · BigQuery |
| Cloud file | S3 · GCS · Azure Blob (CSV or Parquet, inferred from URL suffix) |
| Browser | One-shot CSV · Excel · Parquet download |
Hit Run Now and your golden records land in the target table or object store. overwrite
truncates first; append leaves prior rows in place. We don't keep a copy — your
warehouse is the source of truth.
Or push back to the CRM you pulled from. Instead of a warehouse, reverse-sync UPDATEs the original HubSpot / Salesforce records with your survived golden values — per-record, behind a dry-run + confirm gate.
Next Steps
- GoldenMatch reference — tune blocking + scoring + clustering
- Dirty CSV to Golden Records — full walkthrough
- Interactive dedupe demo — tune thresholds visually