Open source
The engine, on PyPI. MIT-licensed.
Golden Suite ships a hosted workbench. The matching engine underneath is 5 separately-published Python packages you can use directly in your pipeline. No SaaS subscription, no API key, no rate limit — just pip install and code.
The 5 packages
goldenmatch
PyPI →pip install goldenmatch
The matching engine — blocking, scoring, clustering. Pinned by Golden Suite SaaS but usable standalone.
goldencheck
PyPI →pip install goldencheck
Data-quality scan over a CSV: completeness, validity, consistency, uniqueness. Auto-triage suggestions.
goldenflow
PyPI →pip install goldenflow
Standardization transforms — phone E.164, address USPS-standard, name title-case. Composable per-field.
goldenpipe
PyPI →pip install goldenpipe
Pipe runner — chains check + flow + match + survivorship into a single callable.
infermap
PyPI →pip install infermap
Schema inference — proposes a target-schema mapping from column names + value patterns.
Quick start
A 10-line dedup of a Polars DataFrame:
import polars as pl
import goldenmatch
df = pl.read_csv("customers.csv")
result = goldenmatch.dedupe_df(df)
print(f"Found {len(result.clusters)} clusters from {len(df)} records")
for cluster_id, cluster in result.clusters.items():
if cluster.get("size", 0) > 1:
print(f" Cluster {cluster_id}: {cluster['members']}")That's the same engine call Golden Suite's workbench makes under the hood. The packaging is identical — Golden Suite pins specific Suite versions in its backend (see /admin/health); you'd run whatever versions you pin in your own pyproject.toml.
When OSS-direct beats Golden Suite SaaS
- • You already have a Python data pipeline. Drop the packages in; no separate auth, no separate billing.
- • You need bespoke matching logic. Custom scorers, domain-specific rules, embedding-based blocking. The OSS engine is extensible in ways Golden Suite's SaaS surface deliberately isn't.
- • Your data plane is air-gapped. Defense, sovereign workloads, anything with a hard "no external SaaS" constraint. Self-host the engine, keep data local.
- • You're a research team. Publishing benchmarks, reproducing results — the open-source path is the right shape.
What you give up: the workbench UI, the audit-chain plumbing, the stewardship review queue, the lineage walker, the daily scheduling, the SOC2-aligned controls — all the operational surface that took ~6 months to build around the engine. See /compare/build-vs-buy for the honest math on that tradeoff.
License + support
- • MIT-licensed. Use it commercially, modify it, vendor it. No attribution required beyond keeping the LICENSE file.
- • No SLA on the open-source. Issues + PRs on GitHub are answered as time permits.
- • Paid support tier for OSS users — if you're running the engine in production and want guaranteed response SLAs without going to the SaaS, contact
ben@bensevern.dev.