CSV upload — the simplest source

Drag-and-drop a CSV into Golden Suite. The fastest path to seeing the workbench in action.

The simplest Golden Suite source. Drag a CSV onto the workbench, the auto-config infers schema, you're matching within 30 seconds. No credentials, no auth flow, no API limits.

When CSV upload is the right choice

  • First-time exploration — sample data to test matching quality
  • One-off cleanup jobs — dedupe a marketing list before a campaign
  • Small datasets — under ~500k rows; bigger and the connectors below scale better
  • Data not yet in a system — exported from a legacy app, scraped, hand-curated
  • Testing schema mapping — sample 100 rows from a new source before wiring the full connector

How it works

  1. /golden/sources → Add source → CSV upload (or just drag onto the dropzone)
  2. InferMap auto-detects the column types and proposes a mapping to the target schema
  3. Review the mapping in the autoconfig wizard
  4. Commit + dispatch — first golden records in seconds for small files

The CSV is uploaded directly to the backend, parsed with Polars, and stored as raw rows. Encoding is utf-8 lossy (handles latin-1 chars in legacy exports without crashing).

Sample CSV — what works

id,email,first_name,last_name,company,phone,created_at
1,sarah@acme.com,Sarah,Johnson,Acme Corp,+1-415-555-0100,2024-03-15
2,sarah.j@acmecorp.com,Sarah,Johnson-Smith,"Acme, Inc.",4155550100,2024-08-22
3,bob@example.com,Robert,Smith,Example LLC,+15555550200,2024-01-10

Things the auto-config will figure out:

  • Column types (email = email pattern, phone = phone-like, dates = ISO timestamps)
  • Which columns look like identifiers vs free-text
  • Which columns are good blocking-signal candidates (email domain, name prefix)
  • Which columns probably need standardization (phone has multiple formats above)

Things to watch for

Encoding

Latin-1 / Windows-1252 encoded files (common from legacy systems) parse via the utf-8 lossy path — invalid bytes get replaced with . If you see garbled characters in the preview, re-export the source CSV as UTF-8 before uploading.

Header row

The first row is treated as headers. If your CSV doesn't have headers, add them — col1, col2, col3 is fine.

Free-text columns

The InferMap auto-config has heuristics for guessing column purpose. Free-text fields ("Notes", "Comments", "Description") usually get marked as low-signal and excluded from matching. That's usually right. If you have a free-text field that does carry identity signal, mark it explicitly in the autoconfig wizard.

Quoting

Standard CSV quoting works ("field, with comma"). Embedded newlines in quoted fields also work. If your file has non-standard quoting (curly quotes, missing closing quotes), normalize before uploading.

File size

The frontend dropzone caps uploads at 50 MB. Larger files: either split + upload in chunks, or use the Postgres/S3 connectors instead.

When NOT to use CSV upload

  • Recurring ingest — every CSV upload creates a separate source row. For weekly/daily re-ingest, use a connector that supports incremental cursors (Postgres, Salesforce, Stripe, S3).
  • Files over 50 MB — see size note above.
  • Files with sensitive data uploaded over WiFi — the upload is HTTPS to bensevern.dev but if your security policy requires keeping data inside your VPC, use a connector to your warehouse instead.

Next steps