SYS JS.DEV
BUILD F3CREK
DATE 2026.04.26
UTC 01:30 UTC
LOC NYC → STANFORD
STATUS OPEN TO ML/SYSTEMS ROLES

Contact-Data Normalization

Solo migration of 4B+ contact records from nested JSON on a monolith model into normalized relational tables, with real-time sync, zero downtime, and zero performance regression.

Scale 4B+ ROWS
Downtime ZERO
Mode REAL-TIME DUAL-WRITE
Unblocks GDPR · MONOLITH DECOMP

The single biggest architectural-debt bet I’ve executed at RocketReach.

Problem

Contact data — emails and phones — was stored as nested JSON on the profile model. The decision was made very early in the company’s life and made sense at the time. By 2025 it was the largest single design flaw in the codebase: it blocked indexed reverse lookups (which contact maps to which profile?), made GDPR-style privacy removals nearly impossible, and was the thing standing between us and decomposing the monolith.

The constraint was sharp: a live, heavily-loaded system, thousands of writers in the codebase touching the JSON contact field, billions of existing rows to backfill, and zero tolerance for downtime or perf regression.

Design

I rejected the obvious options:

The chosen design was a real-time sync at the Django field layer: low-level field-type overrides ensure every write to the nested JSON also produces a corresponding write to the new normalized profile_email and profile_phone tables, with in-memory dedup, persistence on profile.save(), and custom managers handling the mapping. Backfill ran alongside live sync with batched throughput control and consistency checks against the JSON source of truth.

Outcome

The new tables now hold 4B+ records combined. Migration shipped solo, with no site disruption and no performance regression. The downstream effects are bigger than the migration itself:

STACK · Python · Django · PostgreSQL