Skip to main content
AIpricly

Customer support

Compare GPT-5, Claude, DeepSeek and others for customer support chatbots. Real monthly cost for 1M conversations.

eneszh

Your usage

Default assumptions
Monthly requests1,000,000
Avg input tokens600
Avg output tokens180

When to use this scenario

Customer support chatbots have a peculiar shape that breaks the usual "pick the smartest model" instinct. Each turn is short (300–800 input tokens, 100–250 output). Volume is enormous — a mid-market SaaS easily routes a million conversations through chat per month, sometimes ten million. Latency matters because users sit watching a spinner. And quality differences shrink: a strong base model and a frontier model produce nearly identical answers to "Where's my order?" or "How do I reset my password?"

Put differently — customer support is where price/performance asymmetry is the largest of any common LLM scenario. A chatbot at 1M conversations per month on a frontier-tier output rate (compare the live calculator below) easily spends low four figures on output alone. Move to a cheap-tier model in the same per-million range as Gemini Flash (<ModelPrice id="google/gemini-2-5-flash" field="output" />) or DeepSeek V3.5 (<ModelPrice id="deepseek/deepseek-v3-5" field="output" />) and the same workload costs a fraction of that. Twelve months in, you've saved enough to fund another engineer.

Why the recommended chain looks like this

Primary: Gemini 2.5 Flash. Cheap, fast first-token, strong on factual recall and structured-output adherence. Its weak spot — long-form reasoning chains — is the one thing customer support workflows never need.

Fallback: DeepSeek V3.5. Steps in when the primary errors or hits a regional outage. Roughly the same per-token cost; quality lands one tier higher on edge-case phrasings. The fallback should be cheap enough that the savings the primary unlocks isn't erased by every failure event.

Baseline: GPT-5. Listed only to surface what the "expensive default" would have cost. The monthly-cost panel shows the delta — typically 6-12× over the recommended chain.

Common pitfalls

  • Choosing the strongest model for every reply. A trim-line classifier in front of the LLM can route 85% of turns to a smaller model with no perceived quality drop.
  • Ignoring P95 latency in favor of P50. Customer support is real-time; tail latency drives user frustration more than median. Filter the comparison table by P95.
  • Underestimating multilingual switching cost. Some models price per language tier or charge extra for non-English context windows; check defaultUsage.languages in the cost calculator.
  • Skipping the moderation layer. A 5¢/M moderation call is cheap insurance against a single viral screenshot of a chatbot being rude.

Quality bar — what to verify before shipping

Build a 50-conversation gold set covering: refund requests, password resets, multi-turn troubleshooting, abusive users, and at least one non-English exchange. Score each chain on (a) factual correctness, (b) refusal pattern when asked for things outside scope, (c) tone consistency. The recommended chain typically scores within 2-3 points of the GPT-5 baseline on each axis — if your gold set shows a wider gap, your conversations may have more reasoning load than the average and you should bias toward a smarter primary.

What this scenario does not cover

Voice support (use the voice-assistant scenario), in-app guided troubleshooting that interacts with the product UI (more agentic — use code-generation or data-extraction patterns), and high-stakes financial-services support where regulatory expectations dominate (use legal-contract-analysis for the refusal-pattern emphasis).

Recommended routing

Sorted by best value for your usage
PRIMARY
Gemini 2.5 Flash
Google · quality 78 · 320 tok/s
Monthly cost$630
Vs baseline75%
P50 latency0.3s
FALLBACK
DeepSeek V3.5
DeepSeek · quality 81 · 95 tok/s
Monthly cost$134
Vs baseline95%
P50 latency1.5s
Llama 4 Scout
Meta · quality 75 · 380 tok/s
Monthly cost$228
Vs baseline91%
P50 latency0.2s

Baseline = GPT-5 at the same usage = $2.5K/mo.

Routing simulator

Phase 2 preview

Drag the slider to split traffic between Gemini 2.5 Flash (primary) and DeepSeek V3.5 (fallback). See how your monthly bill moves — without writing a line of gateway code.

Primary: Gemini 2.5 FlashFallback: DeepSeek V3.5
70% Gemini30% DeepSeek
Blended monthly cost$481at the usage assumed above
Vs GPT-581%$2.5K$481
Vs all-primary24%$630$481

Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.

Stored in your browser only until our email backend lands. No tracking, one click to remove.

Use this routing via API

Phase 2 preview · gateway not live yet
PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.
Preview the planned API call
$ curl https://api.aipricly.com/v1/chat/completions \
  -H "Authorization: Bearer $AIPC_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "customer-support",
    "messages": [{"role": "user", "content": "..."}]
  }'

Related scenarios