28 seeds, one corroborated lead: an Epstein-network investigation in public data
What an entity-resolution pipeline finds (and misses) when pointed at 28 publicly-sourced seeds from the Epstein corporate-network reporting.
Read this first. Every claim in this post is sourced to primary public records or to named secondary reporting. Every cluster the matcher produced is a hypothesis, not a fact. Presence in the ICIJ Offshore Leaks Database does not imply wrongdoing — many entities in that dataset are legitimate corporate structures. The one finding I describe as "corroborated" is corroborated by multiple independent public sources cited inline. The other leads remain hypothesis-grade and are noted as such.
This is the third post in a short series on building goldenmatch-shell-company-network. The first post covered the engineering, ingesting ICIJ + GLEIF + OpenSanctions + UK PSC into 4.1M unified company rows on Railway. The second walked one clean cluster from raw leak rows to a GLEIF-anchored finding.
This post is the messy counterpart. I took 28 publicly-named entities from sourced reporting about Jeffrey Epstein's corporate footprint and asked the pipeline a simple question: which of these does the public-leak corpus actually carry, and what does it carry about them?
The honest answer is "not much, but one finding holds up."
The 28 seeds
The seed list came from four public sources:
- The USVI Sex Offender Registry (and associated state filings) — entities Epstein registered in the US Virgin Islands.
- The 2020 NYDFS consent order against Deutsche Bank — entities named in the regulator's findings.
- The 2019 Senate Finance Committee correspondence about Epstein's tax filings.
- The 2023 JPMorgan settlement filings — entities named in the public docket.
These are all secondary; I didn't have access to the source primary records (the USVI Lieutenant Governor's corporate registry isn't published in bulk and isn't carried by any aggregator I had). The seeds are names plus jurisdiction plus rough date range. For each seed I asked the pipeline to find any public-corpus entity that plausibly refers to the same legal entity.
Full seed list and per-seed disposition: reports/investigations/ in the repo.
The structural gap: no USVI registry
Eighteen of the 28 seeds are USVI-registered entities. Of those eighteen, the pipeline returned zero in-jurisdiction matches. Not "low-confidence matches" — zero. The names don't appear in ICIJ, they don't appear in OpenSanctions, they don't appear in GLEIF.
That's not a matcher failure. That's a data-coverage failure, and it has a specific cause: the USVI Lieutenant Governor's Corporations and Trademarks Division registry is not in any of the public datasets the pipeline ingests. It's not in ICIJ (which only carries entities that appeared in a specific leak); it's not in GLEIF (USVI entities don't typically register for LEIs); it's not in OpenSanctions (which carries sanctions, PEPs, and re-exports of ICIJ, not raw registry data); it's not in the UK PSC register, for obvious reasons.
OpenCorporates probably has some of it. I don't have an OpenCorporates API key wired up yet, and their USVI coverage is incomplete in any case. This is a real, structural gap in public-source corporate transparency, not a defect of any one pipeline.
What this means in practice. If you want to map a corporate footprint that includes USVI-registered entities, no amount of cleverness on the matching side will substitute for not having the registry. The seed list told the pipeline where to look; the corpus didn't have what was being looked for.
The one finding that held up: Liquid Funding / Bear Stearns / Jeffrey Lipman
One seed produced a finding that survived external corroboration: Liquid Funding, Ltd., a Bermuda entity in the ICIJ Paradise Papers (Appleby subset).
The pipeline's output for this seed was:
- Direct ICIJ hit:
icij:82004676—Liquid Funding, Ltd.in Bermuda. Seventeen officers listed across the entity's filing history. - Among those officers:
Epstein - Jeffrey E(icij:80063035), listed asdirectorandchairmanfrom 2001-11-09 to 2007-03-30. - A 2-hop expansion flagged:
Lipman - Jeffrey M(icij:80061377), co-director of Liquid Funding, also appearing as a director ofBear Stearns International Funding (Bermuda) Limitedin the same corpus.
That 2-hop hit is the kind of lead that has to be corroborated before it's worth publishing, because "shared name string in ICIJ" is not the same as "same person." Common surnames are dangerous in officer matching; ICIJ doesn't carry DOB or address for most officer records.
I corroborated it through four independent public sources:
-
FINRA BrokerCheck (a US securities-industry regulator-authoritative source). Individual record CRD# 717915, Jeffrey Mark Lipman, registered with
BEAR, STEARNS & CO. INC. CRD# 79, New York, NY, from 10/1980 to 09/2008. Twenty-eight years at Bear Stearns. This is regulator-issued data, not a secondary report. -
A second ICIJ record for the same person.
LIPMAN JEFFREY M(icij:110014080) appears in the Paradise Papers Barbados corporate registry asDirectorofBEAR STEARNS CARIBBEAN ASSET HOLDINGS LTD.from 2008-07-10 onward. That's a different ICIJ node, in a different jurisdiction, in the same leak — the kind of cross-leak link that's hard to fake. Same individual, two appearances. -
The economic linkage between Bear Stearns and Liquid Funding is publicly reported. The National Memo's piece "Epstein's Really Big Short" reports Bear Stearns held a 40% equity stake in Liquid Funding. That is consistent with the matcher's observation that Bear Stearns staff (Lipman) sat on Liquid Funding's board — Bear Stearns owned 40% of the entity.
-
OffshoreAlert has published 357 pages of Bermuda Registrar of Companies filings for Liquid Funding Ltd. (source page). The filings tag Jeffrey Epstein, Jeffrey Lipman, Paul Novelly, Marcus Klug, James Burritt, Roger Heintzelman, Liquid Funding Holdings, Bear Stearns, and Appleby (as registered agent). That confirms the broader corporate-registry record matches the ICIJ snapshot.
The reading: the matcher's 2-hop lead was correct. The two ICIJ Lipman records are the same Jeffrey M Lipman, a 28-year Bear Stearns Senior Vice President. Bear Stearns held 40% of Liquid Funding. The corporate-network ICIJ surfaces is a real Bear-Stearns-anchored Bermuda structure that Epstein chaired and directed.
None of this is new reporting — every primary source above is public and has been written about elsewhere. What's new is that the matcher, given only the seed name Liquid Funding, surfaced the Bear Stearns linkage from the public-leak data without being told to look for it. That's the test the pipeline passed for this one seed.
Where the matcher fell over: full-corpus person dedupe
I tried to run a full GoldenMatch dedupe on person_entities.parquet (796,944 rows after ICIJ-only, 1.95M after adding OpenSanctions). The current person config blocks on a name-prefix derivative that produces, among other things, a 72,070-row bearer t placeholder block. The all-pairs scoring step on that block allocates ~38 GB of float64 score matrix and OOMs on the 24 GB Railway service.
This is the same "block too big" failure that pushed the company pipeline toward list-match. The fix for the person side is one of:
- A tighter blocking key (current key is too permissive).
- Progressive blocking on top of the current key.
- Pre-stripping placeholder names the way
filter_company_table.pydoes for companies.
I haven't shipped the fix yet. For the specific Epstein question this didn't matter — I substituted a direct query on the person table:
df.filter(pl.col('normalized_name').str.contains('epstein'))
# → 29 rows
Of those 29 rows, exactly one is Epstein - Jeffrey E. The other 28 are clearly different individuals (Alan Lee Epstein, Eli Epstein, Glenn H Epstein, etc.) or surname-only matches inside longer composite names. No alternate-spelling Jeffrey Epstein records. No J. Epstein, no Jeffrey M Epstein, no Jeffrey Mark Epstein, no Jeffrey Edward Epstein. ICIJ carries exactly one record for him.
That negative result matters: it means the absence of additional Epstein-network entities in the corpus isn't a name-normalization artefact. It's a corpus-coverage gap, the same one that hits the USVI seeds.
What OpenSanctions added (and didn't)
Halfway through the investigation I added OpenSanctions to the ingest. Two practical changes:
- It did not add a Jeffrey Epstein person record. Surprising given his criminal history, but the OS default collection skews toward active sanctions, PEPs, and regulator-issued debarments — not historical convicted-criminal records. An "Epstein" name-prefix scan in OS returned 18 persons, none of them him.
- It added one anchor for Liquid Funding.
opensanctions:icijol-82004676is the OS re-export of the ICIJ Liquid Funding node, with a populated Bermuda registry company number (EC29378). Same entity, different anchor — a useful identifier the ICIJ-only pass didn't have.
The 28 seeds otherwise behaved the same with OS added as without it. OpenSanctions widens the corpus for sanctions and PEP coverage; it doesn't widen historical registry coverage in any meaningful way for this question.
What this investigation does not claim
A few things this post explicitly does not claim, because they are the kinds of overreach this work invites:
- It does not claim that any person or entity beyond the corroborated Lipman / Bear Stearns / Liquid Funding finding committed any wrongdoing. Liquid Funding itself was a real Bermuda structured-finance vehicle that went through normal Members' Voluntary Liquidation in 2015.
- It does not claim that the structural USVI gap is a sign of intentional concealment by anyone. It's a feature of how public corporate-registry data is published.
- It does not claim that the pipeline is complete. It's missing OpenCorporates ingest, it's missing the USVI registry, and the person-side dedupe needs configuration work.
- It does not claim novel reporting. Every primary fact in the corroborated finding is on the public record and has been reported elsewhere. The contribution here is methodological: that an entity-resolution pipeline can surface the same linkage from public-leak data given only the seed name.
What it does claim
- An entity-resolution pipeline pointed at the public-leak corpora can independently reproduce one piece of the publicly-reported Epstein corporate footprint (the Bear Stearns / Liquid Funding linkage via Jeffrey M Lipman), corroborated by FINRA, a second ICIJ node, OffshoreAlert, and named secondary reporting.
- The same pipeline returns near-empty results for 18 USVI seeds, which is a real and structural public-data gap, not a matcher defect.
- Person-side dedupe at this corpus scale needs a more selective blocking key than the company side. The current config trips on placeholder-heavy blocks.
Key takeaways
- An entity-resolution pipeline is most useful as a lead generator, not as a fact engine. Every lead it surfaces should be corroborated against primary sources before it's treated as a fact.
- Public-data coverage is the dominant constraint at this kind of work. The USVI gap isn't fixable by better matching.
- A finding that holds up under four independent public sources (regulator-issued + cross-leak + secondary reporting + corporate-registry filings) is qualitatively different from a finding that's only the matcher's output. Treat that distinction carefully.
- Negative results matter. "ICIJ has exactly one Jeffrey Epstein record, no alternate spellings" is itself useful information about the limits of the public corpus.
Reproducing this investigation
Per-seed disposition, the corroborated finding's full source list, and the dedupe-sanity scripts are all in the repo:
# 1. Run the standard pipeline (see post 1 in this series)
# 2. Build the person table
uv run python scripts/build_person_table.py
# 3. Query a specific person
uv run python scripts/investigate_person.py \
--name "Jeffrey Epstein" --min-score 80
# 4. 2-hop expansion from a seed entity
uv run python scripts/expand_2hop.py \
--entity-uid icij:82004676 \
--label liquid_funding \
--named-individuals-only
All output reports live under reports/investigations/.
What I'd do next
Three concrete next steps if I keep pushing this:
- OpenCorporates ingest with a USVI seed query. Won't close the gap entirely (their USVI coverage is partial), but it's the only public source that touches the registry at all.
- Person-side blocking refit. Either a tighter key or a progressive-blocking pass over the current
bearer t-style mega-blocks. - Centrality + community detection on the cluster sub-graph. I have the NetworkX graph but haven't run Louvain on the full corpus. Likely surfaces 1-2 additional candidate clusters worth a writeup.
If you have hand-curated seed lists for other corporate-footprint questions in public-leak data, the pipeline is open-source and runs on a $5 Railway box. Try it.
Repo: benseverndev-oss/goldenmatch-shell-company-network. GoldenMatch: pip install goldenmatch. Previous posts: pipeline engineering · Phoenix Spree cluster walkthrough.