← Glossary

Data profiling

Generating a statistical summary of a dataset — value distributions, null rates, cardinality, format patterns — before doing anything with it.

Data profiling is what you do before you trust a dataset. For each column: how many distinct values exist, what's the null rate, what's the type distribution, what's the value-frequency histogram, are there outliers?

Profiling outputs answer questions like: - "Is this field unique enough to be a candidate identifier?" - "Is this field clean enough to dedupe on, or do I need to standardize first?" - "Is the schema what I think it is?"

In Golden Suite, profiling runs automatically on every source via the GoldenCheck tool. The results feed into auto-config decisions (which fields look like good blocking signals, which need standardization, etc.) before the resolver runs.