Quickstart

Walk the full funnel: connect a source, run autoconfig + match, clear the review queue, push to a destination. Five minutes end-to-end.

This quickstart walks the full funnel on the hosted SaaS — source in, golden records out. The whole loop is ~5 minutes on a small dataset.

1. Connect a source

Head to Sources and pick a connector: CSV upload, Postgres connection string, or any of the 22 SaaS API connectors (HubSpot, Salesforce, Stripe, Pipedrive, Klaviyo, Shopify, etc.). For SaaS connectors, paste the API key or click through OAuth; credentials are encrypted server-side.

Tip: If two sources share an entity_type (e.g. person), the matcher pools them automatically — that's the whole point of running this between HubSpot and Salesforce.

2. Let autoconfig propose match rules

Open a workbench notebook against your source(s). goldenmatch 1.18's multi-wave autoconfig inspects your schema and proposes:

  • Which columns to block on (high-cardinality identifiers)
  • Per-field scorers (name → token Jaccard, email → exact, address → token-sort, etc.)
  • A clustering threshold tuned to the data shape
  • Survivorship rules for which source wins on each field

You can override anything, but the defaults are usually good for a first useful run. No DSL to learn before you see a result.

3. Run the matcher

goldenmatch demo
import goldenmatch
result = goldenmatch.dedupe("customers.csv")
print(f"Found {result.cluster_count} clusters from {result.record_count} records")
print(f"Ambiguous (review-queue): {len(result.ambiguous_merges)}")

The dispatch returns three things:

  1. Auto-merged clusters — high-confidence matches that became golden records directly
  2. Ambiguous clusters — ones the scorer wasn't sure about; these land in your review queue
  3. A postflight report — cluster IDs, demoted scorers, and confidence histograms

4. Clear the review queue

The Review queue shows every ambiguous cluster the matcher punted on. Each decision (approve / split / merge) is recorded in the audit trail and propagates to the next destination push, so your golden records reflect the call you made.

Roadmap: decisions will additionally feed the scorer's field-rules layer so future runs auto-merge similar clusters without you re-deciding. The wiring is tracked in issue #135 — right now decisions stop at the audit log + the next export.

5. Push to a destination

Configure a destination — warehouse table or cloud file drop.

FamilyTargets
WarehousesPostgres · MySQL · Snowflake · BigQuery
Cloud fileS3 · GCS · Azure Blob (CSV or Parquet, inferred from URL suffix)
BrowserOne-shot CSV · Excel · Parquet download

Hit Run Now and your golden records land in the target table or object store. overwrite truncates first; append leaves prior rows in place. We don't keep a copy — your warehouse is the source of truth.

Or push back to the CRM you pulled from. Instead of a warehouse, reverse-sync UPDATEs the original HubSpot / Salesforce records with your survived golden values — per-record, behind a dry-run + confirm gate.

Next Steps

Was this page helpful?
Edit this page on GitHub