← Glossary

Data quality

How fit a dataset is for its intended use — measured along completeness, validity, consistency, uniqueness, timeliness, and accuracy.

Data quality is the umbrella term for everything that can go wrong with data before it gets to an analytics dashboard or a billing system. The standard six dimensions:

  • Completeness — are required fields populated?
  • Validity — does each value conform to its expected format (valid email, valid phone)?
  • Consistency — does the same value appear the same way across rows / sources?
  • Uniqueness — are duplicates absent (this is where ER comes in)?
  • Timeliness — is the value current relative to its real-world referent?
  • Accuracy — does the value actually reflect reality?

The first five are testable from the data alone; accuracy requires comparing to an external truth and is the hardest to measure. Golden Suite's GoldenCheck tool generates completeness/validity/consistency reports automatically on every ingest.