Lineage and provenance

Every golden record points back to the raw rows that built it. Lineage answers 'where did this come from?' without leaving the UI.

A golden record without lineage is a vibe. With lineage, it's a defensible artifact.

What's recorded

For every golden record, Golden Suite persists:

  1. Which raw rows contributed to it — the rows from your sources that the resolver decided refer to the same entity.
  2. Which notebook event produced the merge — the resolver run that turned those raw rows into this golden record. Stored as notebook_event_id on the entity.
  3. Which scorers fired and how they voted — captured in the postflight report attached to the event.
  4. Which survivorship rule produced each field's value — every field on the golden record knows which rule wrote it and which source row supplied the value.
  5. Every subsequent change — corrections, splits, merges, manual approvals. Every mutation appends to the audit log.

You can walk the chain from any field on any golden record back to the original source CSV upload, with every decision in between visible.

The lineage UI

The Lineage tab on /golden/entities/{id} renders this as a tree:

Golden record: cust_4f2a (Acme Corp)
├── Field "legal_name" = "Acme Corporation, LLC" (rule: source_priority, source: registry_of_incorp)
├── Field "phone" = "+1-415-555-0100" (rule: most_recent, source: stripe, updated 2026-04-22)
├── Field "address" = "100 Main St, San Francisco" (rule: most_complete, source: crm)
└── Contributing raw rows (5)
    ├── crm_2026-04-15.csv row 7892
    ├── stripe_2026-04-22.csv row 142
    ├── ...

The export path produces the same data as a JSON sidecar so you can pipe it into a downstream audit system without scraping the UI.

The audit log

Lineage covers "how was this golden record built". The audit log covers "what changed after that, when, by whom".

Every survivorship-rule edit, every correction, every approved cluster merge, every manual split — each appends one row to audit_log with the before/after state.

The audit log has a cryptographic chain: each row stores a SHA-256 of its contents plus the previous row's hash. If anyone tampered with an older row, the chain breaks at that point and Verify Chain in the admin UI will show exactly where.

Combined with the REVOKE UPDATE, DELETE on the table itself (best-effort, since the app role is currently a superuser on Railway), the audit log is functionally append-only.

Why this is non-negotiable

Regulated industries (healthcare, financial services, anything touching PII at scale) have to answer "show me how you arrived at this decision" for any individual record, sometimes years after the fact. A trust me bro export of "here's our customer list" is not an answer.

Lineage means: pick any field on any golden record, get the full chain back to source. Audit log means: pick any policy change, see when and why it happened.

The mechanics are described in backend/CLAUDE.md under the Audit section — audit_log (migration 032), the chain trigger (migration 034), verify_chain(), and services/provenance.py:build_provenance() are the load-bearing pieces.