Curriculum Vitae
Backend & ML systems engineer. Founding engineer at RocketReach. Five-plus-year tenure across analytics infrastructure, contact-data systems, applied AI, and the multi-provider LLM gateway that backs every AI feature at the company.
Experience
Founding Engineer — RocketReach
OCT 2020 — PRESENT · REMOTE (BELLEVUE → SF → NYC)
One of the first engineers, hired during Series A. Five-plus-year tenure through architectural eras: analytics infrastructure → core data platform → contact-data systems at scale → applied AI / LLM infra. Reverse-chronological below; further detail and case-study links on Work.
§ RocketReach — 2026
- Designed and shipped the AI-sampled rules model for the community-program matching pipeline. 30× throughput, 92% cost reduction, 2.5× more usable contact data extracted vs. the prior production ML model. Most-valued data contribution to current exit conversations.
- Built an internal phone-data-normalization package that re-canonicalizes 250M phones across 3B+ data points into a scalable waterfall ingestion system. PM characterized it as the single largest data improvement of the past year.
§ RocketReach — 2025
Email verification + domain-success prediction
- Rewrote in-lookup email verification to run in parallel with explicit candidate prioritization.
- Built a full AI-driven domain-success tracking system, doubling prediction success.
- Drove lookup success rate from ~55% (start of 2024) to ~90% (end of 2025) — direct credit/revenue multiplier.
Contact-data normalization
- Solo-shipped migration of 4B+ contact records from nested JSON on the profile model into normalized profile_email and profile_phone tables.
- Real-time sync via low-level Django field-type overrides + custom managers; in-memory dedup; persistence on profile.save().
- Launched with zero downtime, zero performance regression. Unblocks GDPR removals and monolith decomposition.
5x5 data-partner integration
- Highly efficient Airflow DAG built almost entirely on Redshift merge joins; ETL processes 150M record updates per month in 4 hours.
- Pattern (Redshift compute → S3 → Lambda → queue → workers) adopted as the company-wide template for high-throughput data work.
- Privacy-aware pre-filtering by hashed contact comparison in-query — no PII persisted.
AI data-enrichment service
- Built the scalable backend AI enrichment service that opened a new upsellable product class (API enrichments).
- Hackathon prototype that the company subsequently picked up as the multi-quarter GenAI roadmap.
Privacy-removals system
- Built the GDPR-compliant profile-removal system, made possible by the earlier contact-data normalization (contact-to-profile reverse lookup via hashed contact data).
§ RocketReach — 2024
RocketVerify — custom async SMTP
- Designed and led a scalable email-verification service implementing a custom async SMTP protocol; scaled to ~500M emails/month.
- Heavy Playwright integration for the long tail of providers that resist standard SMTP probing.
Lookup parallelization & offline verification
- Reimplemented lookups multi-threaded; reduced P90 latency from ~12s to ~3s.
- Built offline verification on top of RocketVerify with ML-based candidate prioritization; cached the most expensive part of lookups ~90% of the time.
Multi-provider LLM gateway (with Cam)
- Co-built the in-house service that abstracts over Anthropic, OpenAI, xAI, and Gemini behind a single interface, with native websearch.
- Backbone for every AI feature at the company: industry classification, AI enrichment, domain-success prediction, AI-sampled rules extraction, privacy tooling.
- Owns routing, failover, rate limits, retries, cost accounting, secret rotation, and unified observability.
Industry classification — LLM labeling
- Redesigned industry categorization using vectorized semantic industries + LLM-based labeling; lifted fill rate from ~10% to ~99%.
Orphan-profile trimming
- Heuristic + ML scoring methods to trim indexed profiles. Improved search latency and reduced Elasticsearch index size.
§ RocketReach — 2023
Redshift warehouse
- Stood up the company's first data warehouse. Ran a Redshift Serverless pilot directly with the CTO and our AWS rep; killed the pilot for being a poor fit for our workload shape, launched provisioned instead.
- Re-implemented sitemap generation on Redshift with merge-join geometry; 8 days → 4 hours (~48× speedup). Directly load-bearing for SEO, the primary top-of-funnel acquisition channel.
Microservices: SpaCy + Profile-Photo
- Co-designed scalable FastAPI microservices. Authored most of the Terraform.
- Solo-led the Profile-Photo microservice; debugged and resolved memory-leak issues in the facial-recognition stack via Python memory profiler.
Brightdata phone enrichment
- Designed scalable system for Endato data ingestion + matching; produced 32M+ new and boosted phones; substantial Terraform; dashboard for ingestion progress.
Emergency SendGrid replacement
- Replaced SendGrid with a message-queue model (SES + SNS + SQS) processing email activity in real time.
PeoplePro email enrichment
- Overhauled the PeoplePro pipeline; added 14M+ new emails (10M more in flight). Source accounts for ~30% of phones sold.
Academic — entity resolution package
- Built a Python package implementing entity resolution as BERT embeddings + multilayer perceptron. Advised by a JHU CS professor.
- Read dozens of papers on deep-learning ER; benchmarked embedding families and similarity geometries.
§ RocketReach — 2022
Analytics + data quality
- Overhauled finance/revenue ETLs into a real-time customer-events model across all payment processors (Recurly, Stripe, Adyen, Braintree); reduced ETL errors to ~0%.
- Created the data-engineering interview assessment; led 3-month onboarding of three new engineers.
- Migrated all analytics code out of the rr monolith; built custom query wrappers, job scheduling, Slack alerting, and dev tooling.
- Overhauled enterprise attribution; first accurate enterprise revenue, cash, and deal-flow reporting.
Core data
- Sourced, QA'd, and integrated 10+ datasets for UK and French people data — added 2M+ emails and phones.
- Built DataPerson / DataPersonManager abstractions, dramatically cutting time to integrate a new dataset; substantial query-optimization and indexing work.
- Designed a precursor to slowly-changing dimensions via snapshot + diff.
§ RocketReach — 2020 / 2021
Analytics infrastructure (zero → one)
- Built the company's initial OLAP database / analytics warehouse. Authored the first full suite of product reporting.
- Built attribution infrastructure linking individual transactions and subscriptions across Braintree, Stripe, Recurly, and Adyen back to users; first cash-to-user attribution at the company.
- Optimized + scaled ETLs to handle a significant volume increase, ~50× ETL performance improvement.
Data-quality + data-partner work
- Sourced hundreds of contact-data providers; helped negotiate and close new provider deals.
- Drove integration of the PeoplePro 2021 dataset (~16% of all phones in the database).
Earlier roles & undergrad
Lead Junior Analyst — Resolve Growth Partners
Sep 2019 — Oct 2020Part-time during school; full-time May–Oct 2020. Selected as 1 of 2 summer analysts on a 4-person investment team managing a $125M growth-equity fund focused on B2B SaaS.
- Sourced and qualified 2,000+ investment leads; managed a 15,000-company CRM.
- Built investor decks for life-sciences and field-service-management deals.
- Performed retention modeling for 25+ software companies.
- Increased outreach response rate by ~400% on target prospects.
Founder & President — Johns Hopkins Quantitative Finance Society
Jun 2019 — Dec 2021Co-founded a graduate quant-finance research society as a long-short systematic-trading group with $30K AUM. Faculty advisor: Prof. John Miller (JHU Applied Math & Statistics). 20+ student researchers (8 PhD).
- Built a strategy-agnostic backtesting engine — local equity-data persistence, portfolio tracking, performance evaluation.
- Implemented and evaluated pairs-trading and momentum strategies; produced cointegration heatmaps and a recursive fractal generator for non-normal return modeling.
- Regular technical talks to 100+ JHU students. WorldQuant Challenge Semi-Finalist.
Woodrow Wilson Research Fellow — Johns Hopkins University
Feb 2019 — 2020Advisor: Prof. Jonathan H. Wright. $10K research fellowship.
- Investigated the implied–realized volatility gap in equity options markets.
Partner / Associate — A-Level Capital
2019 — 2020Selected as 1 of 7 from ~100 applicants for the JHU-affiliated VC firm (~$530K inaugural fund).
- Sourced 40+ companies, held calls with 20 founders, closed 2 deals.
- Sourced a wearables company that subsequently raised ~70× its initial seed.
Data Analyst — Massachusetts Land Company
Jun 2019 — Sep 2019- Built scrapers for statewide MLS housing data across 576 Massachusetts ZIP codes.
- Conducted broker and developer interviews to inform price-prediction analytics.
Data Science Intern — Empower Schools
Jul 2017 — Sep 2017- Analyzed accountability and performance data for 150,000+ students across the Lawrence and Springfield, MA school districts.
- Built an SVM classifier predicting student college outcomes with ~80% accuracy.
- Presented findings directly to the Empower Schools CEO/founder and executive team.
Data Science & Bioinformatics Intern — Curoverse Inc. (acq. by Veritas Genetics)
Summer 2017- Built an SVM classifier predicting eye color with ~95% accuracy.
- Presented at the Harvard I2B2 TranSMART Symposium. DOI: 10.5281/zenodo.1045265.
Executive Delegate (Internship) — U.S. Department of Education — Massachusetts State Student Advisory Council
May 2017 — Jun 2018- Represented the Greater Boston area on the Massachusetts State Student Advisory Council.
- Led a project identifying districts with outdated wellness policies — 24% of MA districts non-compliant; collaborated with principals and superintendents on revisions.
- Co-developed a campaign on the student mental-health crisis in Boston schools.
Education
Johns Hopkins University — B.S. Computer Science and Economics. Graduated 2023, finishing remaining credits while working full-time at RocketReach.
Selected publications & honors
- SVM eye-color classifier (95% acc.) — presented at the Harvard I2B2 TranSMART Symposium, 2017. DOI: 10.5281/zenodo.1045265.
- Woodrow Wilson Research Fellowship ($10K), Johns Hopkins, 2019. Advisor: Prof. Jonathan H. Wright.
- WorldQuant Challenge — Semi-Finalist (JHU Quantitative Finance Society team).
Technical fluency
| Domain | Expert | Strong | Working |
|---|---|---|---|
| Languages | Python · SQL | — | Bash · TypeScript |
| Backend | Django · PostgreSQL | FastAPI · Pydantic · Aurora · Elasticsearch · Redis | pgvector · Playwright · spaCy |
| Data infra | Redshift · query optimization · indexing · ETL perf · ER pipelines | Airflow · replication · real-time sync · SCDs | — |
| ML / AI | Entity resolution (applied + research) | BERT / transformer embeddings · MLP · Anthropic / OpenAI / xAI / Gemini APIs · prompt eng · LLM labeling | RAG · pgvector · SVM |
| AWS | — | ECS · Lambda · S3 · SES · SNS · SQS · Aurora · IAM · CloudWatch | EventBridge · Athena · ALB/NLB |
| Infra / DevOps | Git | Terraform · GH Actions · CircleCI · Jenkins | Docker (via ECS) |
| Observability | — | Datadog · Sentry · CloudWatch · custom Slack alerting | — |
"Expert" = shipped at scale, can defend every design choice. "Strong" = production-debugged. "Working" = competent, haven't pushed limits.
Aggregate scale
- Services operating on hundreds of billions of data points across career.
- Production tables at billions-of-rows / terabytes scale.
- 5-cluster Aurora topology stood up and maintained in early-devops era.
- ~500M emails/month custom async SMTP throughput.
- 4B+ row contact-data migration shipped solo with zero downtime.