SYS JS.DEV
BUILD F3CREI
DATE 2026.04.26
UTC 01:30 UTC
LOC NYC → STANFORD
STATUS OPEN TO ML/SYSTEMS ROLES

AI-Sampled Rules Model — Community Program

Replaced the production ML model on the most-valuable first-party data pipeline with a rules-based model whose rules are extracted by an LLM iteratively sampling labeled data. 30× throughput, 92% cost reduction, 2.5× more usable contact data.

Throughput 30× VS PRIOR ML
Cost −92%
Usable contact data 2.5× EXTRACTED
Stakeholder weight MOST-VALUED DATA SOURCE FOR EXIT

A research-flavored result — most engineers reach for an ML model when an LLM-sampled rules model is faster, cheaper, more debuggable, and outperforms it.

Problem

The community program is the highest-leverage first-party data source the company has — the most-valued contribution by current acquirers and investors, per ongoing exit conversations. The existing ML matching pipeline was the bottleneck on its impact: throughput-bound, expensive to operate, and missing contact data we knew was extractable.

A bigger or better-tuned ML model was the obvious move. It was the wrong move.

Design

The system uses an LLM to iteratively sample labeled data, propose rules, evaluate the rules against held-out cases, and refine until the rules generalize. The rules are then compiled into a fast deterministic matcher. The LLM is in the offline loop only; production inference is purely the rules.

This shape — use a large model to write a small model — turns out to be remarkably effective on dirty real-world matching: it forces the LLM to articulate the decision boundary, surfaces edge cases the team wouldn’t have thought to label, and produces a system whose decisions can be audited and amended without retraining.

I redesigned the matching service end-to-end alongside the model and built a new scalable matching layer to run it.

Outcome

A clean, surprising win for the rules-via-LLM pattern over the model-on-model arms race.

STACK · Python · LLMs · AWS