SYS JS.DEV
BUILD F3CRES
DATE 2026.04.26
UTC 01:30 UTC
LOC NYC → STANFORD
STATUS OPEN TO ML/SYSTEMS ROLES

Redshift Warehouse — Pilot, Kill, Re-Launch

Stood up the company's first data warehouse. Ran a Redshift Serverless pilot directly with the CTO and our AWS TAM, killed it for being a poor fit, launched provisioned instead. Sitemap generation went from 8 days to 4 hours.

Sitemap perf 8 DAYS → 4 HOURS · 48×
Surface FIRST COMPANY DW
Workload shape BATCH-HEAVY · NOT BURSTY
Decision KILLED SERVERLESS · LAUNCHED PROVISIONED

A judgment-and-stakeholder-management story as much as a perf story.

Problem

The company had no data warehouse. Analytics and core-data teams were both straining a transactional Postgres for queries it was never designed for, including SEO-load-bearing batch jobs that ran for over a week. We needed a real warehouse.

Process

I led the warehouse standup with the BI and Core Data teams and ran a Redshift Serverless pilot in direct collaboration with the CTO and our AWS rep. The pilot was clean and our usage was meaningful — but the workload shape was heavy scheduled batch, not bursty interactive analytics, and Serverless’s price model rewards exactly the opposite. I drove the decision to kill the Serverless pilot and launch provisioned instead, then shipped the production cluster with our DBA and a teammate.

After launch, I went deep on Redshift internals — distribution and sort keys, query planner behavior, vacuum/analyze patterns, merge-join geometry — and re-implemented one of the company’s most expensive jobs.

Outcome

Sitemap generation, which had been an ~8-day Postgres job, dropped to ~4 hours on Redshift — about a 48× speedup. That’s the SEO sitemap pipeline; SEO is one of the most important top-of-funnel acquisition channels for the company, so the speedup was directly load-bearing for organic growth.

The pattern (Redshift compute → S3 → Lambda → queue → workers) became the template for downstream high-throughput data work, including the 5x5 partner integration and community-program matching.

STACK · Redshift · S3 · Postgres · Python