Redshift Warehouse — Pilot, Kill, Re-Launch

Stood up the company's first data warehouse. Ran a Redshift Serverless pilot directly with the CTO and our AWS TAM, killed it for being a poor fit, launched provisioned instead. Sitemap generation went from 8 days to 4 hours.

Year 2023

Org RocketReach

Role Lead, with BI + Core Data + AWS team

Sitemap perf 8 DAYS → 4 HOURS · 48×

Surface FIRST COMPANY DW

Workload shape BATCH-HEAVY · NOT BURSTY

Decision KILLED SERVERLESS · LAUNCHED PROVISIONED

A judgment-and-stakeholder-management story as much as a perf story.

Problem

The company had no data warehouse. Analytics and core-data teams were both straining a transactional Postgres for queries it was never designed for, including SEO-load-bearing batch jobs that ran for over a week. We needed a real warehouse.

Process

I led the warehouse standup with the BI and Core Data teams and ran a Redshift Serverless pilot in direct collaboration with the CTO and our AWS rep. The pilot was clean and our usage was meaningful — but the workload shape was heavy scheduled batch, not bursty interactive analytics, and Serverless’s price model rewards exactly the opposite. I drove the decision to kill the Serverless pilot and launch provisioned instead, then shipped the production cluster with our DBA and a teammate.

After launch, I went deep on Redshift internals — distribution and sort keys, query planner behavior, vacuum/analyze patterns, merge-join geometry — and re-implemented one of the company’s most expensive jobs.

Outcome

Sitemap generation, which had been an ~8-day Postgres job, dropped to ~4 hours on Redshift — about a 48× speedup. That’s the SEO sitemap pipeline; SEO is one of the most important top-of-funnel acquisition channels for the company, so the speedup was directly load-bearing for organic growth.

The pattern (Redshift compute → S3 → Lambda → queue → workers) became the template for downstream high-throughput data work, including the 5x5 partner integration and community-program matching.

STACK · Redshift · S3 · Postgres · Python