2026-04-11
Product Catalog Dedup on a Real 1M-Row Dataset: F1 0.05 → 0.36 in Three Steps
Running the full Golden Suite (GoldenCheck → GoldenFlow → GoldenMatch) on the UCI Online Retail II catalog. Real, unsynthetic duplicates. Honest numbers — and how fixing the eval, switching to Vertex AI embeddings, and tuning the threshold lifted F1 7× from a hopeless lexical baseline.
entity-resolutiongoldencheckgoldenflowgoldenmatchecommercevertex-aipython