Semantic search

Compare text-embedding-3-small, Voyage-3, Cohere Embed v4 for knowledge base search. From $1/mo for 5M tokens. MTEB scores and latency compared.

Your usage

Default assumptions

Monthly requests5,000,000

Avg input tokens0

Avg output tokens0

When to use this scenario

Semantic search powers knowledge base retrieval, internal wiki search, documentation assistants, and RAG (retrieval-augmented generation) pipelines. Unlike keyword search (BM25), embedding-based retrieval finds semantically similar documents even when the query and document share no keywords — critical for paraphrased questions, technical synonyms, and cross-language retrieval.

OpenAI text-embedding-3-small is the most cost-efficient embedding model with strong MTEB benchmark scores for its price tier. At $0.02/million tokens, embedding 5M tokens/month costs $0.10. The "large" model at $0.13/million produces higher-dimensional embeddings (3072 vs 1536) with better retrieval quality — the right choice when your corpus is large and precision matters more than cost.

Voyage-3 outperforms text-embedding-3-small on domain-specific corpora (legal, medical, code) despite similar pricing, making it a better fallback for specialized knowledge bases rather than general-purpose enterprise wikis.

Common pitfalls

Embedding documents without chunking strategy — a 10-page document embedded as a single vector loses paragraph-level precision; chunk at 256–512 tokens with 10–20% overlap for RAG retrieval
Ignoring embedding dimension mismatch when switching providers — if you swap from a 1536-dimension to a 768-dimension model, you must re-embed your entire corpus; plan model changes deliberately
Not updating embeddings when source documents change — stale embeddings return outdated content confidently, which is worse than returning no results
Using the same embedding model for queries and documents without verifying it was trained for asymmetric retrieval — some models encode short queries and long documents into incompatible semantic spaces

Recommended routing

Sorted by best value for your usage

PRIMARY

text-embedding-3-small

OpenAI · quality 80 · — tok/s

Monthly cost$0.00

Vs baseline−0%

P50 latency0.1s

Use this

FALLBACK

Voyage 3

Voyage · quality 88 · — tok/s

Monthly cost$0.00

Vs baseline−0%

P50 latency0.1s

Add as fallback

Voyage 3 Large

Voyage · quality 91 · — tok/s

Monthly cost$0.00

Vs baseline−0%

P50 latency0.1s

Try

Baseline = text-embedding-3-large at the same usage = $0.00/mo.

Routing simulator

Phase 2 preview

Drag the slider to split traffic between text-embedding-3-small (primary) and Voyage 3 (fallback). See how your monthly bill moves — without writing a line of gateway code.

Primary: text-embedding-3-smallFallback: Voyage 3

70% text-embedding-3-small30% Voyage

Blended monthly cost$0.00at the usage assumed above

Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.

Stored in your browser only until our email backend lands. No tracking, one click to remove.

Use this routing via API

Phase 2 preview · gateway not live yet

PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.

Preview the planned API call

$ curl https://api.aipricly.com/v1/chat/completions \
  -H "Authorization: Bearer $AIPC_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "semantic-search",
    "messages": [{"role": "user", "content": "..."}]
  }'

Get notified at launch

Related scenarios

B-roll & stock footage

Compare Hailuo-02, Kling 2.1, Google Veo 3 Fast for stock b-...

Brand voice content

Compare Gemini 2.5 Flash, Claude Haiku, GPT-5 for on-brand m...

Chat with docs

Compare LLMs for retrieval-augmented generation: long-contex...