Matching funnel · between your SaaS and your warehouse

Pipe your messy SaaS data in.
Push golden records out.

Connect HubSpot, Salesforce, Stripe, Cvent, Bizzabo, ad-hoc Postgres — pick a destination warehouse or a Parquet drop. We match, cluster, and surface the un-auto-merged clusters in your review queue. You label, the system learns. Your data stays yours: bensevern.dev is the collector and filter, never the source of truth.

Onboarding inertia is near-zero — the multi-wave autoconfig (goldenmatch 1.18) proposes match rules + survivorship + scorer weights from your data shape. No thresholds to tune, no DSL to learn before a first useful run.

22+
source connectors live
7
warehouse + cloud destinations
B³ 0.95
B-Cubed F1 · real NC voter data
$99
/mo Pro · free tier
01Ingest

HubSpot · Salesforce · Stripe · Pipedrive · Shopify · Klaviyo · S3 · GCS · Azure Blob · SFTP · Postgres · MySQL · Snowflake · BigQuery

02Match + review

Autoconfig proposes match rules from your schema. Ambiguous clusters surface in a review queue; your labels feed back into the next run's scorer.

03Push

Postgres · MySQL · Snowflake · BigQuery · S3 / GCS / Azure (CSV or Parquet) · browser-download CSV / Excel / Parquet. Truncate-and-load or append.

Built for

  • RevOps at mid-market B2B SaaS, where duplicate accounts double-count the forecast and break attribution
  • The ops analyst who hand-dedupes the campaign or account list in Excel before every send and board deck
  • PE-backed platforms merging customer and vendor records across post-acquisition systems on a synergy clock
  • Data teams who want MDM-grade dedup without standing up an MDM platform

Not built for

  • Fortune 500 with a dedicated MDM team + Reltio license
  • HIPAA / PHI workloads (no BAA yet)
  • Real-time streaming match (batch only)
  • Source-of-truth storage — we filter and pass through, not host

See it find the duplicates

The same matcher the workbench runs, on a sample CRM. Drag the strictness slider and watch the duplicate clusters form. No signup, nothing to install. Then bring your own file.

4 duplicate groups · 10/10 records merged · 0 unique

strictness 0.70
Looser (more grouping)Stricter (fewer)
Duplicate group85% match
Jon Smithjsmith@acme.ioAcme
John Smithj.smith@acme.ioAcme Inc
J. Smithjsmith@acme.ioACME
Duplicate group80% match
Bob Leeblee@initech.comInitech
Robert Leerlee@initech.comInitech LLC
Bob Leeblee@initech.comInitech
Duplicate group80% match
Sara Parkspark@hooli.comHooli
Sarah Parks.park@hooli.comHooli
Duplicate group74% match
Mary Chenmchen@globex.comGlobex
Mary Chennmary.chen@globex.comGlobex Corp

What the SaaS adds on top of the open-source engine

The matching engine is five MIT-licensed Python packages — self-host them if you want to wire your own funnel. The hosted version wraps them in the connectors, review queue, audit trail, and destination push so you can ship without writing the plumbing.

Source connectors

22 connectors on the ingest side: HubSpot, Salesforce, Stripe, Pipedrive, Klaviyo, Shopify, S3, GCS, Azure Blob, Postgres, MySQL, Snowflake, BigQuery, and more. OAuth or API key — both flows handled.

Warehouse + file destinations

7 destinations on the push side: Postgres / MySQL / Snowflake / BigQuery for warehouses; S3 / GCS / Azure Blob (CSV or Parquet) for file drops. Plus browser CSV / Excel / Parquet download.

Review queue + steward UX

Ambiguous clusters surface for a human call. Approve, split, or merge — decisions land in the audit trail and the next destination push.

Autoconfig — no DSL to learn

Multi-wave autoconfig proposes match rules, scorer weights, and survivorship from your schema shape. First useful run before you read the docs.

Multi-tenant + cryptographic audit

Clerk orgs, per-org quotas, plan-aware rate limits. Append-only audit_log with per-org SHA-256 chain. Verify the trail end-to-end.

You keep your data — and the engine

We collect, filter, hand back. The warehouse is your source of truth. The matching engine is MIT-licensed: if we go away, you keep running. Try that with Reltio.

The orchestrator inside the funnel

Sources flow in. Autoconfig proposes the match shape from your schema. The matcher clusters records, the review queue catches the ambiguous ones, golden records materialize. Then the destination push lands them in your warehouse.

Event-sourced notebooks — every tool call is a step you can replay, fork, rewind, or audit. Trust gates classify mutating tools into auto / gated / confirm based on the notebook's trust tier.

Three funnel use-cases we're shipping for

Run the funnel in under five minutes.

Free tier includes 3 source connectors, 2 destinations, the full review queue, and a demo project that walks you through the loop. No DSL to learn first — autoconfig proposes the match shape from your schema.