Skip to main content
AIpricly

Recommendation system

Compare text-embedding-3-small, Voyage-3-Large, Cohere Embed v4 for content and product recommendations. From $1/mo for 5M tokens.

Your usage

Default assumptions
Monthly requests5,000,000
Avg input tokens0
Avg output tokens0

When to use this scenario

Embedding-based recommendation computes similarity between a user's interaction history and a catalog of items (articles, products, courses, videos) to surface relevant next-best items. It complements collaborative filtering by handling the cold-start problem: new items with no interaction history can be recommended immediately based on their embedding similarity to items the user has engaged with.

OpenAI text-embedding-3-small encodes both catalog items and user history documents into the same vector space cheaply. A catalog of 500K product descriptions embedded at 200 tokens each consumes 100M tokens — a one-time $2 cost. Incremental embedding of new catalog items is negligible. The runtime query (embed user history + ANN search) costs fractions of a cent per session.

Voyage-3-Large is preferred for catalogs with long, complex item descriptions (technical documentation, academic papers, detailed product specs) where the higher-dimensional 1024-vector space captures more semantic nuance. Cohere Embed v4 offers Matryoshka representations that let you trade retrieval quality for storage cost at inference time — useful for very large catalogs under memory constraints.

Common pitfalls

  • Embedding all item fields (title + description + reviews + specs) into a single vector without testing which field combination maximizes retrieval relevance for your specific task
  • Not filtering by availability before returning recommendations — recommending out-of-stock products or deprecated content requires real-time metadata filtering in the ANN query, not just embedding similarity
  • Treating recommendation as a pure offline problem — user preferences shift; catalog embeddings older than 6 months for fast-moving domains (news, fashion, software) degrade recommendation quality measurably
  • Ignoring diversity constraints: an ANN search returns the N most similar items, which are often near-duplicates of each other; apply maximal marginal relevance or category diversification to produce a useful recommendation slate

Recommended routing

Sorted by best value for your usage
PRIMARY
text-embedding-3-small
OpenAI · quality 80 · tok/s
Monthly cost$0.00
Vs baseline0%
P50 latency0.1s
FALLBACK
Voyage 3 Large
Voyage · quality 91 · tok/s
Monthly cost$0.00
Vs baseline0%
P50 latency0.1s
embed-v4
Cohere · quality 90 · tok/s
Monthly cost$0.00
Vs baseline0%
P50 latency0.1s

Baseline = embed-v4 at the same usage = $0.00/mo.

Routing simulator

Phase 2 preview

Drag the slider to split traffic between text-embedding-3-small (primary) and Voyage 3 Large (fallback). See how your monthly bill moves — without writing a line of gateway code.

Primary: text-embedding-3-smallFallback: Voyage 3 Large
70% text-embedding-3-small30% Voyage
Blended monthly cost$0.00at the usage assumed above

Phase 2 turns this routing into a real OpenAI-compatible endpoint — one key, one bill, automatic failover. Drop your email to be notified at launch.

Stored in your browser only until our email backend lands. No tracking, one click to remove.

Use this routing via API

Phase 2 preview · gateway not live yet
PHASE 2 PREVIEW · gateway not live yetThis endpoint does not exist yet. The gateway is in Phase 2 — what you see below is a design preview of the planned interface, not a live API. We will email subscribers when it launches.
Preview the planned API call
$ curl https://api.aipricly.com/v1/chat/completions \
  -H "Authorization: Bearer $AIPC_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "recommendation-system",
    "messages": [{"role": "user", "content": "..."}]
  }'

Related scenarios