2026-04-01
Deduplicating 401K Equipment Records with LLM Calibration
We ran GoldenMatch on 401,125 bulldozer auction records from Kaggle. Iterative LLM calibration learned the optimal match threshold from just 200 pairs (~$0.01). ANN hybrid blocking recovered 949 records that string blocking missed.
entity-resolutionequipment-datallmgoldenmatchpythonann-blocking