Compare vision-capable LLMs: OCR, visual Q&A, document parsing. Real per-image cost across providers.
Your usage
Default assumptions
Monthly requests50,000
Avg input tokens500
Avg output tokens200
When to use this scenario
Visual Q&A, OCR, document parsing — anything that takes an image as input. Backed by a multimodal LLM (not a dedicated OCR model). Cost has two parts: image input fee + text output.
Gemini 2.5 Pro and GPT-5 lead on visual understanding benchmarks. Output tokens dominate cost when summarizing images; image fee dominates when classifying many.
Common pitfalls
Sending high-detail images when low-detail would do (10× price for marginal accuracy)
Pre-OCR'ing with Tesseract then sending text — usually worse than letting the model see the image
Drag the slider to split traffic between Gemini 2.5 Pro (primary) and GPT-5 (fallback). See how your monthly bill moves — without writing a line of gateway code.
Primary: Gemini 2.5 ProFallback: GPT-5
70% Gemini30% GPT-5
Blended monthly cost$131at the usage assumed above
Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.
Stored in your browser only until our email backend lands. No tracking, one click to remove.
Use this routing via API
Phase 2 preview · gateway not live yet
PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.