Connecting S3 to Golden Suite

Set up the S3 connector for CSV / Parquet / JSON files in a bucket.

The S3 connector reads CSV, Parquet, or newline-JSON files from a bucket. Use it when your data plane writes scheduled exports to S3 (a common pattern with data warehouses, Kafka sinks, or third-party tools).

Prereqs

  • An S3 bucket with read access
  • IAM credentials (access-key + secret) with s3:GetObject + s3:ListBucket on the target prefix
  • Either:
    • One file per source — point at s3://bucket/path/to/file.csv
    • A prefix — Golden Suite reads all matching files under the prefix

Setup

  1. Create an IAM user for Golden Suite — read-only on the bucket:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:GetObject", "s3:ListBucket"],
          "Resource": [
            "arn:aws:s3:::your-bucket",
            "arn:aws:s3:::your-bucket/exports/*"
          ]
        }
      ]
    }
    
  2. Generate access keys for the IAM user
  3. /golden/sources → Add source → S3
  4. Paste credentials + bucket + prefix or file path
  5. Pick file format — CSV / Parquet / JSON
  6. Test connection — Golden Suite hits HEAD on the path

Supported formats

FormatNotes
CSVUTF-8 expected. Header row required. Same parsing as the upload connector.
ParquetBest choice for large files. Columnar; schema is preserved.
JSONL (newline-delimited JSON)One JSON object per line. Flat or nested — nested objects need to be flattened in your source pipeline first.
JSON (single document)Only for small files. The whole doc loads into memory.
gzip-compressed.csv.gz, .json.gz etc. transparently decompressed.

Cursor strategy

For ongoing ingest of a prefix that gets new files daily, the cursor is file modified-time. Golden Suite tracks the latest-seen LastModified per source and only reads files newer than that on subsequent runs.

If you replace files in-place (same path, new contents), the cursor still detects the LastModified update and re-reads. If you want to FORCE a re-read of unchanged files, use the "Reset cursor" button in the source detail page.

Sample bucket layout

A common pattern — daily exports under a date-partitioned prefix:

s3://your-bucket/exports/customers/
  ├── 2026-05-10/customers.parquet
  ├── 2026-05-11/customers.parquet
  ├── 2026-05-12/customers.parquet
  └── ...

Configure the source with prefix exports/customers/. The connector reads new daily files; combined with goldenmatch.dedupe_df it produces fresh golden records each day.

Common gotchas

  • IAM permission scoping. Use s3:GetObject on the specific prefix, not the whole bucket. Following principle-of-least-privilege.
  • Endpoint URL for non-AWS S3. The connector supports custom endpoint_url for S3-compatible services (Cloudflare R2, MinIO, Backblaze). Same code path.
  • Region. If your bucket is in eu-west-1 and your IAM user defaults to us-east-1, sign requests with the right region — Golden Suite asks for region during setup.
  • Large files. Single CSV files over 1 GB will OOM the parser. Use Parquet (columnar streaming) or split into multiple files.
  • Permissions checked at file level, not just bucket-list. If the user can list the bucket but not GetObject on some files, the connector errors on first read; the error message identifies the failing key.
  • KMS-encrypted objects. Add kms:Decrypt to the IAM policy for the bucket's KMS key.

Cost considerations

  • S3 GET requests: $0.0004 per 1000. Even a daily ingest of 1000 files = $0.0004/day = trivial.
  • S3 data transfer out: free within the same region; $0.09/GB cross-region. Pin Golden Suite's backend region to match the bucket (Enterprise tier supports region pinning).

Next steps