Recommendation system

Compare text-embedding-3-small, Voyage-3-Large, Cohere Embed v4 for content and product recommendations. From $1/mo for 5M tokens.

Your usage

Default assumptions

Monthly requests5,000,000

Avg input tokens0

Avg output tokens0

When to use this scenario

Embedding-based recommendation computes similarity between a user's interaction history and a catalog of items (articles, products, courses, videos) to surface relevant next-best items. It complements collaborative filtering by handling the cold-start problem: new items with no interaction history can be recommended immediately based on their embedding similarity to items the user has engaged with.

OpenAI text-embedding-3-small encodes both catalog items and user history documents into the same vector space cheaply. A catalog of 500K product descriptions embedded at 200 tokens each consumes 100M tokens — a one-time $2 cost. Incremental embedding of new catalog items is negligible. The runtime query (embed user history + ANN search) costs fractions of a cent per session.

Voyage-3-Large is preferred for catalogs with long, complex item descriptions (technical documentation, academic papers, detailed product specs) where the higher-dimensional 1024-vector space captures more semantic nuance. Cohere Embed v4 offers Matryoshka representations that let you trade retrieval quality for storage cost at inference time — useful for very large catalogs under memory constraints.

Common pitfalls

Embedding all item fields (title + description + reviews + specs) into a single vector without testing which field combination maximizes retrieval relevance for your specific task
Not filtering by availability before returning recommendations — recommending out-of-stock products or deprecated content requires real-time metadata filtering in the ANN query, not just embedding similarity
Treating recommendation as a pure offline problem — user preferences shift; catalog embeddings older than 6 months for fast-moving domains (news, fashion, software) degrade recommendation quality measurably
Ignoring diversity constraints: an ANN search returns the N most similar items, which are often near-duplicates of each other; apply maximal marginal relevance or category diversification to produce a useful recommendation slate

Recommended routing

Sorted by best value for your usage

PRIMARY

text-embedding-3-small

OpenAI · quality 80 · — tok/s

Monthly cost$0.00

Vs baseline−0%

P50 latency0.1s

Use this

FALLBACK

Voyage 3 Large

Voyage · quality 91 · — tok/s

Monthly cost$0.00

Vs baseline−0%

P50 latency0.1s

Add as fallback

embed-v4

Cohere · quality 90 · — tok/s

Monthly cost$0.00

Vs baseline−0%

P50 latency0.1s

Try

Baseline = embed-v4 at the same usage = $0.00/mo.

Routing simulator

Phase 2 preview

Drag the slider to split traffic between text-embedding-3-small (primary) and Voyage 3 Large (fallback). See how your monthly bill moves — without writing a line of gateway code.

Primary: text-embedding-3-smallFallback: Voyage 3 Large

70% text-embedding-3-small30% Voyage

Blended monthly cost$0.00at the usage assumed above

Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.

Stored in your browser only until our email backend lands. No tracking, one click to remove.

Use this routing via API

Phase 2 preview · gateway not live yet

PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.

Preview the planned API call

$ curl https://api.aipricly.com/v1/chat/completions \
  -H "Authorization: Bearer $AIPC_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "recommendation-system",
    "messages": [{"role": "user", "content": "..."}]
  }'

Get notified at launch

Related scenarios

B-roll & stock footage

Compare Hailuo-02, Kling 2.1, Google Veo 3 Fast for stock b-...

Brand voice content

Compare Gemini 2.5 Flash, Claude Haiku, GPT-5 for on-brand m...

Chat with docs

Compare LLMs for retrieval-augmented generation: long-contex...