AI-Sampled Rules Model — Community Program

Replaced the production ML model on the most-valuable first-party data pipeline with a rules-based model whose rules are extracted by an LLM iteratively sampling labeled data. 30× throughput, 92% cost reduction, 2.5× more usable contact data.

Year 2026

Org RocketReach

Role Solo — design, model, integration

Throughput 30× VS PRIOR ML

Cost −92%

Usable contact data 2.5× EXTRACTED

Stakeholder weight MOST-VALUED DATA SOURCE FOR EXIT

A research-flavored result — most engineers reach for an ML model when an LLM-sampled rules model is faster, cheaper, more debuggable, and outperforms it.

Problem

The community program is the highest-leverage first-party data source the company has — the most-valued contribution by current acquirers and investors, per ongoing exit conversations. The existing ML matching pipeline was the bottleneck on its impact: throughput-bound, expensive to operate, and missing contact data we knew was extractable.

A bigger or better-tuned ML model was the obvious move. It was the wrong move.

Design

The system uses an LLM to iteratively sample labeled data, propose rules, evaluate the rules against held-out cases, and refine until the rules generalize. The rules are then compiled into a fast deterministic matcher. The LLM is in the offline loop only; production inference is purely the rules.

This shape — use a large model to write a small model — turns out to be remarkably effective on dirty real-world matching: it forces the LLM to articulate the decision boundary, surfaces edge cases the team wouldn’t have thought to label, and produces a system whose decisions can be audited and amended without retraining.

I redesigned the matching service end-to-end alongside the model and built a new scalable matching layer to run it.

Outcome

Throughput up 30× vs. the prior ML model.
Cost down 92%.
2.5× more usable contact data extracted from the same input.
This data source has roughly doubled in match rate and ingestion volume, becoming the most impactful contribution for the company’s exit narrative.

A clean, surprising win for the rules-via-LLM pattern over the model-on-model arms race.

STACK · Python · LLMs · AWS