Open source

The engine, on PyPI. MIT-licensed.

Golden Suite ships a hosted workbench. The matching engine underneath is 5 separately-published Python packages you can use directly in your pipeline. No SaaS subscription, no API key, no rate limit — just pip install and code.

The 5 packages

goldenmatch

PyPI →
pip install goldenmatch

The matching engine — blocking, scoring, clustering. Pinned by Golden Suite SaaS but usable standalone.

goldencheck

PyPI →
pip install goldencheck

Data-quality scan over a CSV: completeness, validity, consistency, uniqueness. Auto-triage suggestions.

goldenflow

PyPI →
pip install goldenflow

Standardization transforms — phone E.164, address USPS-standard, name title-case. Composable per-field.

goldenpipe

PyPI →
pip install goldenpipe

Pipe runner — chains check + flow + match + survivorship into a single callable.

infermap

PyPI →
pip install infermap

Schema inference — proposes a target-schema mapping from column names + value patterns.

Quick start

A 10-line dedup of a Polars DataFrame:

import polars as pl
import goldenmatch

df = pl.read_csv("customers.csv")
result = goldenmatch.dedupe_df(df)

print(f"Found {len(result.clusters)} clusters from {len(df)} records")
for cluster_id, cluster in result.clusters.items():
    if cluster.get("size", 0) > 1:
        print(f"  Cluster {cluster_id}: {cluster['members']}")

That's the same engine call Golden Suite's workbench makes under the hood. The packaging is identical — Golden Suite pins specific Suite versions in its backend (see /admin/health); you'd run whatever versions you pin in your own pyproject.toml.

When OSS-direct beats Golden Suite SaaS

  • You already have a Python data pipeline. Drop the packages in; no separate auth, no separate billing.
  • You need bespoke matching logic. Custom scorers, domain-specific rules, embedding-based blocking. The OSS engine is extensible in ways Golden Suite's SaaS surface deliberately isn't.
  • Your data plane is air-gapped. Defense, sovereign workloads, anything with a hard "no external SaaS" constraint. Self-host the engine, keep data local.
  • You're a research team. Publishing benchmarks, reproducing results — the open-source path is the right shape.

What you give up: the workbench UI, the audit-chain plumbing, the stewardship review queue, the lineage walker, the daily scheduling, the SOC2-aligned controls — all the operational surface that took ~6 months to build around the engine. See /compare/build-vs-buy for the honest math on that tradeoff.

License + support

  • MIT-licensed. Use it commercially, modify it, vendor it. No attribution required beyond keeping the LICENSE file.
  • No SLA on the open-source. Issues + PRs on GitHub are answered as time permits.
  • Paid support tier for OSS users — if you're running the engine in production and want guaranteed response SLAs without going to the SaaS, contact ben@bensevern.dev.
Learn the engineOr use the hosted workbench →