AI Routing
Automatically route every AI call to the best model using your own API keys. Reduce spend. Improve quality. Stay vendor-neutral.
How it works
- Classify — We detect your prompt type (code, structured_extraction, summarization, creative, long_context_reasoning, general_qa) with a hybrid classifier: fast rules for obvious cases, embedding similarity otherwise. Every response includes
task_typeandclassification_confidence(0–1). Useforce_modelto override if you disagree. - Score — Models are scored by strategy (cost, quality, latency). Quality uses a per-model, per-task-type reliability matrix (deterministic, no ML).
- Select — Best model within your cost cap is chosen. For
lowest_cost, the literal cheapest model wins; other strategies use a weighted score (cost, quality, latency). - Execute — We call the provider with your stored API key and return the result (or stream it).
Strategies
Choose how we prioritize models:
lowest_cost— Prioritize cheapest models (DeepSeek, Groq, GPT-4.1-mini)balanced— Balance cost, quality, and latencymax_reliability— Prioritize quality over costfastest— Prioritize low-latency models (Groq, Gemini Flash)
Plans and request limits
Your plan sets the monthly routed-request cap and how many providers you can connect. Limits are enforced; over cap returns 429 with X-RateLimit-* headers. Resets at end of month (UTC).
- Starter — 50,000 requests/month; up to 3 connected providers (OpenAI, Anthropic, Google).
- Growth — 200,000 requests/month; unlimited providers; Control Center (request logs, visibility).
- Scale — 750,000 requests/month; higher limits; see Pricing for details.
Control Center
Growth and Scale plans include the Control Center: a dashboard with request logs (model, provider, strategy, task_type, latency, cost), model usage, provider exposure, and strategy enforcement visibility. Use it to audit spend and tune routing.
Provider keys
Add your API keys at /account/keys. Keys are encrypted at rest and never exposed. We support OpenAI, Anthropic, Google (Gemini), Groq, and DeepSeek.
Force model override
Bypass routing and force a specific model: pass force_model: "provider:modelId" or force_model: "provider". We still normalize and log.
Failover
If the chosen model fails before streaming starts, we automatically retry with the second-ranked model. The response includes fallback_used: true when this happens.
API endpoints
All authenticated endpoints require Authorization: Bearer YOUR_JWT (Supabase session). Full details in the Routing API Reference.
- POST /api/route — Non-streaming. Send
prompt,strategy, optionalmax_costandforce_model. Get backresult,model_used,task_type,classification_confidence,alternatives, etc. - POST /api/recommend — Same body as
/api/route. Returns recommended model, cost,task_type,classification_confidence, and alternatives only; no LLM execution. - POST /api/route/stream — Same body. Returns Server-Sent Events:
meta(model, task_type, classification_confidence, cost), thendeltachunks, thendone. - POST /api/route/demo — No auth. Public demo: our keys, cheap models only, 5 req/IP/day, 800 input / 500 output token limits. Optional
current_modelfor savings comparison.