Redshift Warehouse — Pilot, Kill, Re-Launch
Stood up the company's first data warehouse. Ran a Redshift Serverless pilot directly with the CTO and our AWS TAM, killed it for being a poor fit, launched provisioned instead. Sitemap generation went from 8 days to 4 hours.
A judgment-and-stakeholder-management story as much as a perf story.
Problem
The company had no data warehouse. Analytics and core-data teams were both straining a transactional Postgres for queries it was never designed for, including SEO-load-bearing batch jobs that ran for over a week. We needed a real warehouse.
Process
I led the warehouse standup with the BI and Core Data teams and ran a Redshift Serverless pilot in direct collaboration with the CTO and our AWS rep. The pilot was clean and our usage was meaningful — but the workload shape was heavy scheduled batch, not bursty interactive analytics, and Serverless’s price model rewards exactly the opposite. I drove the decision to kill the Serverless pilot and launch provisioned instead, then shipped the production cluster with our DBA and a teammate.
After launch, I went deep on Redshift internals — distribution and sort keys, query planner behavior, vacuum/analyze patterns, merge-join geometry — and re-implemented one of the company’s most expensive jobs.
Outcome
Sitemap generation, which had been an ~8-day Postgres job, dropped to ~4 hours on Redshift — about a 48× speedup. That’s the SEO sitemap pipeline; SEO is one of the most important top-of-funnel acquisition channels for the company, so the speedup was directly load-bearing for organic growth.
The pattern (Redshift compute → S3 → Lambda → queue → workers) became the template for downstream high-throughput data work, including the 5x5 partner integration and community-program matching.