SYS JS.DEV
BUILD F3CREN
DATE 2026.04.26
UTC 01:30 UTC
LOC NYC → STANFORD
STATUS OPEN TO ML/SYSTEMS ROLES

Multi-Provider LLM Gateway

Co-designed and built the in-house service that abstracts over Anthropic, OpenAI, xAI, and Gemini behind a single interface, with native websearch. Backbone for every AI feature at the company.

Providers ANTHROPIC · OPENAI · XAI · GEMINI
Surface ALL AI FEATURES
Concerns ROUTING · FAILOVER · COST · OBSERVABILITY

The piece of work most directly shaped like a platform team at an AI lab.

Problem

By 2024 we were calling LLMs from a growing number of services — industry classification, contact enrichment, domain-success prediction, AI-driven sampling for rule extraction, privacy-removals tooling. Every service was reaching directly into a vendor SDK. That meant duplicate retry logic, inconsistent observability, no shared cost accounting, no failover when a provider degraded, and a hard ceiling on how fast we could swap models when pricing or capability shifted underneath us.

We needed a single substrate that all AI features could route through.

Design

A single in-house service exposing a uniform message interface across providers. Calls specify capability requirements (model class, tool use, websearch, JSON mode, streaming) rather than vendor-specific knobs. The router picks a provider based on a mix of capability match, current latency, cost ceiling, and observed availability. Provider-specific quirks — Anthropic prompt caching, OpenAI structured outputs, Gemini multimodal, divergent tool-use specs — are exposed where useful and smoothed over where they aren’t.

Cross-cutting concerns live in the gateway: per-provider and per-caller rate limits, retry with jittered backoff, automatic failover to a second provider on a class of errors, token and cost accounting attributed to the calling service, structured logs and Datadog metrics on every request, and unified secret rotation for vendor keys.

Websearch is first-class — both native vendor websearch (Anthropic, OpenAI) and an in-house fetch-and-inject path for cases where neither suits, with citation surfacing back to the caller.

Outcome

Every AI feature at the company ships through this service. Adding a new provider is a per-vendor adapter, not a per-service refactor. When a provider degrades, traffic shifts in seconds rather than after an on-call page. Cost optimization becomes a routing decision rather than a code change.

The gateway is the clearest line of sight from what I’ve built to what an AI-lab platform team does — building against the actual messy surface of current-gen LLM providers, not just calling one of them.

STACK · Python · FastAPI · AWS · Datadog · Pydantic