About

Backend & ML systems engineer. Founding-era engineer at RocketReach — fourth-longest-tenured at the company, behind the cofounders and three other founding engineers. Heading to Stanford in September.

I’m Johnny. I joined RocketReach in October 2020 as one of the first engineers, when the company was under ten people. Five-plus years later it’s around a hundred, and I've been the person who picks up the hardest, least well-scoped data-infrastructure problems as we've scaled. The systems I own run the company's core revenue logic, its email-verification stack at ~500M emails a month, the analytics warehouse, and the multi-provider LLM gateway behind every AI feature the product ships.

What I actually do

Most of the work sits at the intersection of data infrastructure and applied ML. The interesting problems in that space rarely look like the ML papers; they look like correctness, latency, cost, entity resolution, and migration paths through billion-row tables on a live system. I write Python, SQL, Django, FastAPI, and Postgres for a living. I run large jobs on Redshift. I design and ship LLM/embedding pipelines into production where the cost of being wrong is measurable in revenue.

My manager describes the pattern as “remodeling the house while we live in it”. That's the closest description of the actual day job: I take a system that's load-bearing and broken in ways that can't be fixed in a single PR, and I ship the rebuild without anyone noticing. The contact-data normalization project — 4B+ rows, solo, zero downtime, real-time dual-write at the Django field layer — is the canonical example.

Why I'm heading to AI labs

Not running away from anything. Still one of the most senior ICs at RocketReach, the problems are still real, and the team is the best working environment I've had. But the learning rate has compressed. Most data-infra problems I see now are incremental improvements to systems I already built. The hardest problems in the world right now are in AI — training and inference infrastructure, evaluation pipelines for frontier models, agent systems, the messy substrate that makes models actually useful in production.

Some of that work is closer to what I already do than it looks: lookup success-rate systems are eval pipelines wearing a different hat, contact normalization is a data-quality-for-training analog, the LLM gateway is directly platform/tooling work. But the part I want to push deeper into is the training-and-serving substrate — the GPU-orchestration, distributed- training, inference-serving craft — and AI labs are where that work lives.

Background

B.S. Computer Science & Economics, Johns Hopkins. While there I co-founded and led the Quantitative Finance Society as a graduate-tier long-short systematic-trading group with $30K AUM and 20+ student researchers (8 PhD); we made the WorldQuant Challenge semi-finals. I also held a $10K Woodrow Wilson research fellowship on the implied–realized volatility gap, was a partner at JHU's affiliated VC, and worked through college part-time at RocketReach during junior and senior year, full-time during my final semester. Bellevue, then San Francisco, then New York, now back to the Bay.

Beyond software

Long-running interests in metabolic disease mechanisms, anti-aging research, ancient and Stoic philosophy, and AI safety. I read books with derivations in them. I have opinions about fructose. I run, lift, and salsa-dance (badly, improving).

Get in touch

Best paths: LinkedIn and GitHub. Open to conversations about ML/systems hybrid roles in the Bay Area starting fall — ML platform, training/inference infra, eval tooling, applied AI infrastructure. Full background on the CV page.