Voice agents (phone bots, in-app voice) live or die by latency. Each turn must respond in <1s to feel natural. Throughput (tokens/second) matters because users expect streaming.
Use a fast Flash-tier model (Gemini 2.5 Flash, GPT-5 mini, Claude Haiku) — quality difference vs frontier is small for short turn-taking. The math: 100K calls × 6 turns/call = 600K LLM calls.
Common pitfalls
Picking on raw quality benchmarks (slow models break the conversation feel)
Drag the slider to split traffic between Gemini 2.5 Flash (primary) and GPT-5 mini (fallback). See how your monthly bill moves — without writing a line of gateway code.
Primary: Gemini 2.5 FlashFallback: GPT-5 mini
70% Gemini30% GPT-5
Blended monthly cost$419at the usage assumed above
Vs GPT-5−77%$1.8K → $419
Vs all-primary−6%$444 → $419
Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.
Stored in your browser only until our email backend lands. No tracking, one click to remove.
Use this routing via API
Phase 2 preview · gateway not live yet
PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.