2026-05-15/Ben Severn

28 seeds, one corroborated lead: an Epstein-network investigation in public data

What an entity-resolution pipeline finds (and misses) when pointed at 28 publicly-sourced seeds from the Epstein corporate-network reporting.

goldenmatchentity-resolutionicijinvestigationcase-study

Read this first. Every claim in this post is sourced to primary public records or to named secondary reporting. Every cluster the matcher produced is a hypothesis, not a fact. Presence in the ICIJ Offshore Leaks Database does not imply wrongdoing — many entities in that dataset are legitimate corporate structures. The one finding I describe as "corroborated" is corroborated by multiple independent public sources cited inline. The other leads remain hypothesis-grade and are noted as such.

This is the third post in a short series on building goldenmatch-shell-company-network. The first post covered the engineering, ingesting ICIJ + GLEIF + OpenSanctions + UK PSC into 4.1M unified company rows on Railway. The second walked one clean cluster from raw leak rows to a GLEIF-anchored finding.

This post is the messy counterpart. I took 28 publicly-named entities from sourced reporting about Jeffrey Epstein's corporate footprint and asked the pipeline a simple question: which of these does the public-leak corpus actually carry, and what does it carry about them?

The honest answer is "not much, but one finding holds up."

The 28 seeds

The seed list came from four public sources:

These are all secondary; I didn't have access to the source primary records (the USVI Lieutenant Governor's corporate registry isn't published in bulk and isn't carried by any aggregator I had). The seeds are names plus jurisdiction plus rough date range. For each seed I asked the pipeline to find any public-corpus entity that plausibly refers to the same legal entity.

Full seed list and per-seed disposition: reports/investigations/ in the repo.

The structural gap: no USVI registry

Eighteen of the 28 seeds are USVI-registered entities. Of those eighteen, the pipeline returned zero in-jurisdiction matches. Not "low-confidence matches" — zero. The names don't appear in ICIJ, they don't appear in OpenSanctions, they don't appear in GLEIF.

That's not a matcher failure. That's a data-coverage failure, and it has a specific cause: the USVI Lieutenant Governor's Corporations and Trademarks Division registry is not in any of the public datasets the pipeline ingests. It's not in ICIJ (which only carries entities that appeared in a specific leak); it's not in GLEIF (USVI entities don't typically register for LEIs); it's not in OpenSanctions (which carries sanctions, PEPs, and re-exports of ICIJ, not raw registry data); it's not in the UK PSC register, for obvious reasons.

OpenCorporates probably has some of it. I don't have an OpenCorporates API key wired up yet, and their USVI coverage is incomplete in any case. This is a real, structural gap in public-source corporate transparency, not a defect of any one pipeline.

What this means in practice. If you want to map a corporate footprint that includes USVI-registered entities, no amount of cleverness on the matching side will substitute for not having the registry. The seed list told the pipeline where to look; the corpus didn't have what was being looked for.

The one finding that held up: Liquid Funding / Bear Stearns / Jeffrey Lipman

One seed produced a finding that survived external corroboration: Liquid Funding, Ltd., a Bermuda entity in the ICIJ Paradise Papers (Appleby subset).

The pipeline's output for this seed was:

That 2-hop hit is the kind of lead that has to be corroborated before it's worth publishing, because "shared name string in ICIJ" is not the same as "same person." Common surnames are dangerous in officer matching; ICIJ doesn't carry DOB or address for most officer records.

I corroborated it through four independent public sources:

  1. FINRA BrokerCheck (a US securities-industry regulator-authoritative source). Individual record CRD# 717915, Jeffrey Mark Lipman, registered with BEAR, STEARNS & CO. INC. CRD# 79, New York, NY, from 10/1980 to 09/2008. Twenty-eight years at Bear Stearns. This is regulator-issued data, not a secondary report.

  2. A second ICIJ record for the same person. LIPMAN JEFFREY M (icij:110014080) appears in the Paradise Papers Barbados corporate registry as Director of BEAR STEARNS CARIBBEAN ASSET HOLDINGS LTD. from 2008-07-10 onward. That's a different ICIJ node, in a different jurisdiction, in the same leak — the kind of cross-leak link that's hard to fake. Same individual, two appearances.

  3. The economic linkage between Bear Stearns and Liquid Funding is publicly reported. The National Memo's piece "Epstein's Really Big Short" reports Bear Stearns held a 40% equity stake in Liquid Funding. That is consistent with the matcher's observation that Bear Stearns staff (Lipman) sat on Liquid Funding's board — Bear Stearns owned 40% of the entity.

  4. OffshoreAlert has published 357 pages of Bermuda Registrar of Companies filings for Liquid Funding Ltd. (source page). The filings tag Jeffrey Epstein, Jeffrey Lipman, Paul Novelly, Marcus Klug, James Burritt, Roger Heintzelman, Liquid Funding Holdings, Bear Stearns, and Appleby (as registered agent). That confirms the broader corporate-registry record matches the ICIJ snapshot.

The reading: the matcher's 2-hop lead was correct. The two ICIJ Lipman records are the same Jeffrey M Lipman, a 28-year Bear Stearns Senior Vice President. Bear Stearns held 40% of Liquid Funding. The corporate-network ICIJ surfaces is a real Bear-Stearns-anchored Bermuda structure that Epstein chaired and directed.

None of this is new reporting — every primary source above is public and has been written about elsewhere. What's new is that the matcher, given only the seed name Liquid Funding, surfaced the Bear Stearns linkage from the public-leak data without being told to look for it. That's the test the pipeline passed for this one seed.

Where the matcher fell over: full-corpus person dedupe

I tried to run a full GoldenMatch dedupe on person_entities.parquet (796,944 rows after ICIJ-only, 1.95M after adding OpenSanctions). The current person config blocks on a name-prefix derivative that produces, among other things, a 72,070-row bearer t placeholder block. The all-pairs scoring step on that block allocates ~38 GB of float64 score matrix and OOMs on the 24 GB Railway service.

This is the same "block too big" failure that pushed the company pipeline toward list-match. The fix for the person side is one of:

I haven't shipped the fix yet. For the specific Epstein question this didn't matter — I substituted a direct query on the person table:

df.filter(pl.col('normalized_name').str.contains('epstein'))
# → 29 rows

Of those 29 rows, exactly one is Epstein - Jeffrey E. The other 28 are clearly different individuals (Alan Lee Epstein, Eli Epstein, Glenn H Epstein, etc.) or surname-only matches inside longer composite names. No alternate-spelling Jeffrey Epstein records. No J. Epstein, no Jeffrey M Epstein, no Jeffrey Mark Epstein, no Jeffrey Edward Epstein. ICIJ carries exactly one record for him.

That negative result matters: it means the absence of additional Epstein-network entities in the corpus isn't a name-normalization artefact. It's a corpus-coverage gap, the same one that hits the USVI seeds.

What OpenSanctions added (and didn't)

Halfway through the investigation I added OpenSanctions to the ingest. Two practical changes:

The 28 seeds otherwise behaved the same with OS added as without it. OpenSanctions widens the corpus for sanctions and PEP coverage; it doesn't widen historical registry coverage in any meaningful way for this question.

What this investigation does not claim

A few things this post explicitly does not claim, because they are the kinds of overreach this work invites:

What it does claim

Key takeaways

Reproducing this investigation

Per-seed disposition, the corroborated finding's full source list, and the dedupe-sanity scripts are all in the repo:

# 1. Run the standard pipeline (see post 1 in this series)
# 2. Build the person table
uv run python scripts/build_person_table.py

# 3. Query a specific person
uv run python scripts/investigate_person.py \
  --name "Jeffrey Epstein" --min-score 80

# 4. 2-hop expansion from a seed entity
uv run python scripts/expand_2hop.py \
  --entity-uid icij:82004676 \
  --label liquid_funding \
  --named-individuals-only

All output reports live under reports/investigations/.

What I'd do next

Three concrete next steps if I keep pushing this:

  1. OpenCorporates ingest with a USVI seed query. Won't close the gap entirely (their USVI coverage is partial), but it's the only public source that touches the registry at all.
  2. Person-side blocking refit. Either a tighter key or a progressive-blocking pass over the current bearer t-style mega-blocks.
  3. Centrality + community detection on the cluster sub-graph. I have the NetworkX graph but haven't run Louvain on the full corpus. Likely surfaces 1-2 additional candidate clusters worth a writeup.

If you have hand-curated seed lists for other corporate-footprint questions in public-leak data, the pipeline is open-source and runs on a $5 Railway box. Try it.

Repo: benseverndev-oss/goldenmatch-shell-company-network. GoldenMatch: pip install goldenmatch. Previous posts: pipeline engineering · Phoenix Spree cluster walkthrough.