Connecting S3 to Golden Suite
Set up the S3 connector for CSV / Parquet / JSON files in a bucket.
The S3 connector reads CSV, Parquet, or newline-JSON files from a bucket. Use it when your data plane writes scheduled exports to S3 (a common pattern with data warehouses, Kafka sinks, or third-party tools).
Prereqs
- An S3 bucket with read access
- IAM credentials (access-key + secret) with
s3:GetObject+s3:ListBucketon the target prefix - Either:
- One file per source — point at
s3://bucket/path/to/file.csv - A prefix — Golden Suite reads all matching files under the prefix
- One file per source — point at
Setup
- Create an IAM user for Golden Suite — read-only on the bucket:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::your-bucket", "arn:aws:s3:::your-bucket/exports/*" ] } ] } - Generate access keys for the IAM user
/golden/sources→ Add source → S3- Paste credentials + bucket + prefix or file path
- Pick file format — CSV / Parquet / JSON
- Test connection — Golden Suite hits
HEADon the path
Supported formats
| Format | Notes |
|---|---|
| CSV | UTF-8 expected. Header row required. Same parsing as the upload connector. |
| Parquet | Best choice for large files. Columnar; schema is preserved. |
| JSONL (newline-delimited JSON) | One JSON object per line. Flat or nested — nested objects need to be flattened in your source pipeline first. |
| JSON (single document) | Only for small files. The whole doc loads into memory. |
| gzip-compressed | .csv.gz, .json.gz etc. transparently decompressed. |
Cursor strategy
For ongoing ingest of a prefix that gets new files daily, the cursor is file modified-time. Golden Suite tracks the latest-seen LastModified per source and only reads files newer than that on subsequent runs.
If you replace files in-place (same path, new contents), the cursor still detects the LastModified update and re-reads. If you want to FORCE a re-read of unchanged files, use the "Reset cursor" button in the source detail page.
Sample bucket layout
A common pattern — daily exports under a date-partitioned prefix:
s3://your-bucket/exports/customers/
├── 2026-05-10/customers.parquet
├── 2026-05-11/customers.parquet
├── 2026-05-12/customers.parquet
└── ...
Configure the source with prefix exports/customers/. The connector reads new daily files; combined with goldenmatch.dedupe_df it produces fresh golden records each day.
Common gotchas
- IAM permission scoping. Use
s3:GetObjecton the specific prefix, not the whole bucket. Following principle-of-least-privilege. - Endpoint URL for non-AWS S3. The connector supports custom
endpoint_urlfor S3-compatible services (Cloudflare R2, MinIO, Backblaze). Same code path. - Region. If your bucket is in
eu-west-1and your IAM user defaults tous-east-1, sign requests with the right region — Golden Suite asks for region during setup. - Large files. Single CSV files over 1 GB will OOM the parser. Use Parquet (columnar streaming) or split into multiple files.
- Permissions checked at file level, not just bucket-list. If the user can list the bucket but not GetObject on some files, the connector errors on first read; the error message identifies the failing key.
- KMS-encrypted objects. Add
kms:Decryptto the IAM policy for the bucket's KMS key.
Cost considerations
- S3 GET requests: $0.0004 per 1000. Even a daily ingest of 1000 files = $0.0004/day = trivial.
- S3 data transfer out: free within the same region; $0.09/GB cross-region. Pin Golden Suite's backend region to match the bucket (Enterprise tier supports region pinning).
Next steps
- /docs/guides/use-case/customer-360 — S3 is often the "data warehouse export" source
- /docs/runbooks/soc2-logging — same S3 patterns as our own SOC2 log retention setup