When to use this scenario
Semantic search powers knowledge base retrieval, internal wiki search, documentation assistants, and RAG (retrieval-augmented generation) pipelines. Unlike keyword search (BM25), embedding-based retrieval finds semantically similar documents even when the query and document share no keywords — critical for paraphrased questions, technical synonyms, and cross-language retrieval.
OpenAI text-embedding-3-small is the most cost-efficient embedding model with strong MTEB benchmark scores for its price tier. At $0.02/million tokens, embedding 5M tokens/month costs $0.10. The "large" model at $0.13/million produces higher-dimensional embeddings (3072 vs 1536) with better retrieval quality — the right choice when your corpus is large and precision matters more than cost.
Voyage-3 outperforms text-embedding-3-small on domain-specific corpora (legal, medical, code) despite similar pricing, making it a better fallback for specialized knowledge bases rather than general-purpose enterprise wikis.
Common pitfalls
- Embedding documents without chunking strategy — a 10-page document embedded as a single vector loses paragraph-level precision; chunk at 256–512 tokens with 10–20% overlap for RAG retrieval
- Ignoring embedding dimension mismatch when switching providers — if you swap from a 1536-dimension to a 768-dimension model, you must re-embed your entire corpus; plan model changes deliberately
- Not updating embeddings when source documents change — stale embeddings return outdated content confidently, which is worse than returning no results
- Using the same embedding model for queries and documents without verifying it was trained for asymmetric retrieval — some models encode short queries and long documents into incompatible semantic spaces