Skip to main content
AIpricly

Semantic search

Compare text-embedding-3-small, Voyage-3, Cohere Embed v4 for knowledge base search. From $1/mo for 5M tokens. MTEB scores and latency compared.

Your usage

Default assumptions
Monthly requests5,000,000
Avg input tokens0
Avg output tokens0

When to use this scenario

Semantic search powers knowledge base retrieval, internal wiki search, documentation assistants, and RAG (retrieval-augmented generation) pipelines. Unlike keyword search (BM25), embedding-based retrieval finds semantically similar documents even when the query and document share no keywords — critical for paraphrased questions, technical synonyms, and cross-language retrieval.

OpenAI text-embedding-3-small is the most cost-efficient embedding model with strong MTEB benchmark scores for its price tier. At $0.02/million tokens, embedding 5M tokens/month costs $0.10. The "large" model at $0.13/million produces higher-dimensional embeddings (3072 vs 1536) with better retrieval quality — the right choice when your corpus is large and precision matters more than cost.

Voyage-3 outperforms text-embedding-3-small on domain-specific corpora (legal, medical, code) despite similar pricing, making it a better fallback for specialized knowledge bases rather than general-purpose enterprise wikis.

Common pitfalls

  • Embedding documents without chunking strategy — a 10-page document embedded as a single vector loses paragraph-level precision; chunk at 256–512 tokens with 10–20% overlap for RAG retrieval
  • Ignoring embedding dimension mismatch when switching providers — if you swap from a 1536-dimension to a 768-dimension model, you must re-embed your entire corpus; plan model changes deliberately
  • Not updating embeddings when source documents change — stale embeddings return outdated content confidently, which is worse than returning no results
  • Using the same embedding model for queries and documents without verifying it was trained for asymmetric retrieval — some models encode short queries and long documents into incompatible semantic spaces

Recommended routing

Sorted by best value for your usage
PRIMARY
text-embedding-3-small
OpenAI · quality 80 · tok/s
Monthly cost$0.00
Vs baseline0%
P50 latency0.1s
FALLBACK
Voyage 3
Voyage · quality 88 · tok/s
Monthly cost$0.00
Vs baseline0%
P50 latency0.1s
Voyage 3 Large
Voyage · quality 91 · tok/s
Monthly cost$0.00
Vs baseline0%
P50 latency0.1s

Baseline = text-embedding-3-large at the same usage = $0.00/mo.

Routing simulator

Phase 2 preview

Drag the slider to split traffic between text-embedding-3-small (primary) and Voyage 3 (fallback). See how your monthly bill moves — without writing a line of gateway code.

Primary: text-embedding-3-smallFallback: Voyage 3
70% text-embedding-3-small30% Voyage
Blended monthly cost$0.00at the usage assumed above

Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.

Stored in your browser only until our email backend lands. No tracking, one click to remove.

Use this routing via API

Phase 2 preview · gateway not live yet
PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.
Preview the planned API call
$ curl https://api.aipricly.com/v1/chat/completions \
  -H "Authorization: Bearer $AIPC_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "semantic-search",
    "messages": [{"role": "user", "content": "..."}]
  }'

Related scenarios