Compare two lists for overlap — without sharing either list
Use the free /cleanroom tool to find how much two lists overlap without exchanging them. Each side uploads its own file; only encrypted fingerprints are matched and raw files are never stored.
Compare two lists for overlap — without sharing either list
The clean room at /cleanroom lets two parties find out how much
two lists overlap without either side handing over its list. It's free, needs
no login, and you upload your file on the spot — no pre-ingestion or account setup.
Reach for it whenever the question is "how many records are on both lists?" but the lists can't be exchanged: audience/marketing overlap between partners, suppression-list checks, reconciling two systems' customer rolls, or any "do we have the same people?" comparison under privacy constraints.
Note: This is the free, public sibling of the enterprise flow in Link two datasets without sharing PII (which matches two sources you've already ingested). For the underlying theory — Bloom-filter / CLK encoding, what it does and doesn't protect — read Privacy-preserving record linkage.
What gets shared (and what doesn't)
When you upload a CSV, the server turns each row's match fields into an encrypted fingerprint (a Bloom-filter "cryptographic long-term key") and then discards your raw file immediately — it is never written to disk or a database. The matching runs over the fingerprints, so the only things stored for a room are:
- the encrypted fingerprints,
- 0-based row indices, and
- the column-to-field mapping (column names only, never values).
The result each side sees is the overlap count plus a list of its own matched row numbers — never the other party's rows, and never any raw values from either list.
Note: In this tier the server briefly computes the fingerprints for you, so the promise is "we never store your data," not "we never see it." A browser true clean room — where the encoding happens entirely on your device and the server only ever receives fingerprints — is on the roadmap (see What's next).
Walkthrough
1. Create a room (Party A)
Open /cleanroom and list the fields to match on, in order — for
example email, then last name. Both parties must agree on the same fields in
the same order for the fingerprints to line up, so pick fields both lists actually
have. Click Create room.
You'll land on the room page and your browser keeps a private creator token, so only you can act as Party A.
2. Share the link with Party B
Copy the room link shown on the page and send it to the other party (email, chat, however you like). Anyone with the link can join as Party B and, later, view the overlap — so share it the way you'd share anything list-specific.
3. Both parties upload + map columns
Each side picks its CSV (up to 50,000 rows). Because your column names probably
differ ("Email" vs email_address), you map your columns onto the room's
fields with the dropdowns — the values, in field order, are what get fingerprinted.
4. The match runs automatically
As soon as both sides have uploaded, the linkage runs. The page polls for status and switches to results when it's done — no button to babysit.
5. Read the results + download your matches
You'll see the overlap count and how many of your rows matched. Click Download your matched rows (CSV) to get a one-column file of your matched row numbers, which you join back to your own copy of the list to pull the actual records on your side:
row
0
14
57
Tuning the match
- Add a second field (e.g. last name alongside email) to catch real matches that a single field misses, or to disambiguate common values.
- Make sure both sides map semantically the same column to each field — mapping "work email" on one side and "personal email" on the other won't line up.
- Normalization (lowercasing, trimming) is handled for you identically on both sides, so casing and stray whitespace don't break matches.
Limits
- Up to 50,000 rows per side.
- Free, no login.
- Rooms are ephemeral — they expire (24 hours by default) and the fingerprints and results are purged. Download your matches before then.
What's next
The clean room is rolling out in stages. Available now: the free server tier above. On the roadmap:
- A browser true-clean-room tier where your raw data never leaves your device — your browser computes the fingerprints locally and uploads only those.
- Larger jobs and a paid per-run option for that stronger-privacy tier.
Related
- Privacy-preserving record linkage (concept)
- Link two datasets without sharing PII — the enterprise, pre-ingested flow
- PPRL Linkage API