← Docs

Routing API Reference

Intelligent AI routing. We classify your prompt, score models by strategy, and execute using your stored API keys. Requires connected provider keys at /account/keys.

Using LangChain or the OpenAI SDK? StepBlend supports an OpenAI-compatible endpoint: POST /api/v1/chat/completions. Set base_url and use your StepBlend JWT as api_key. See OpenAI-compatible API for full docs and examples.

POST /api/route

Non-streaming. Returns full response in JSON.

POST https://stepblend.com/api/route

Headers

  • Authorization: Bearer YOUR_JWT Supabase session token
  • Content-Type: application/json

Request body

  • prompt (required) Your prompt text
  • strategy (optional) lowest_cost | balanced | max_reliability | fastest. Default: balanced
  • max_cost (optional) Max USD per request (e.g. 0.01)
  • force_model (optional) Override: provider:modelId or provider

Example

curl -X POST https://stepblend.com/api/route \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Summarize this article...", "strategy": "balanced", "max_cost": 0.01}'

Response (200)

The model is chosen using pre-run cost and latency estimates. After the run, actual_cost is the cost for this request (real input + output tokens). Alternatives include estimatedCost (pre-run) and cost_for_this_request (same input + actual output tokens) for apples-to-apples comparison.

  • estimated_cost — Pre-run estimate (used for model selection)
  • actual_cost — Cost for this request after completion (optional)
  • cost_note — Explains that selection is based on pre-run estimates
  • task_type — Classification: code | structured_extraction | summarization | creative | long_context_reasoning | general_qa
  • classification_confidence — 0–1; rule-based = 1, embedding-based = cosine similarity to category
  • metrics.latency_ms — Total response time (ms). metrics.ttft_ms — Time to first token (ms), when available.
{
  "result": "string",
  "model_used": "gpt-4.1-mini",
  "provider": "openai",
  "estimated_cost": 0.00042,
  "actual_cost": 0.00038,
  "cost_note": "Model selected using pre-run cost and latency estimates...",
  "fallback_used": false,
  "reasoning": "Task: summarization. Selected openai...",
  "task_type": "summarization",
  "classification_confidence": 0.92,
  "alternatives": [...],
  "metrics": { "latency_ms": 1200, "ttft_ms": 420 }
}

POST /api/recommend

Recommendation only — no LLM execution. Returns which model would be chosen, cost, task_type, classification_confidence, and alternatives. Use this to preview routing before calling /api/route.

POST https://stepblend.com/api/recommend

Headers

  • Authorization: Bearer YOUR_JWT — Supabase session token
  • Content-Type: application/json

Request body

Same as /api/route: prompt (required), strategy, max_cost, force_model.

Response (200)

{
  "model_used": "gemini-2.5-flash",
  "provider": "google",
  "estimated_cost": 0.00012,
  "reasoning": "Task: general_qa. Selected google...",
  "task_type": "general_qa",
  "classification_confidence": 0.85,
  "alternatives": [
    { "provider": "deepseek", "modelId": "deepseek-chat", "estimatedCost": 0.00014, "score": 0.68, "reliability_delta": -0.01 }
  ]
}

POST /api/route/stream

Streaming response. Same request as /api/route. Returns Server-Sent Events: meta (model, cost, task_type, classification_confidence), then delta (content chunks), then done. Use for low-latency UX.

POST https://stepblend.com/api/route/stream

Headers

  • Authorization: Bearer YOUR_JWT
  • Content-Type: application/json

SSE events

  • event: metamodel_used, provider, estimated_cost, task_type, classification_confidence
  • event: deltacontent (string)
  • event: done — optional ttft_ms, latency_ms (full stream duration), actual_cost (cost for this request after completion). All measured from the real provider stream, not the SSE wrapper.
  • event: errormessage

POST /api/route/demo

Public demo — no auth. Uses our demo keys; limited to cheap models (e.g. gpt-4.1-mini, gemini-2.5-flash, deepseek-chat, llama-3.3-70b). Rate limit: 5 requests per IP per day. Max 800 input tokens, 500 output tokens. Optional current_model for savings comparison.

POST https://stepblend.com/api/route/demo

Headers

No Authorization. Content-Type: application/json.

Request body

  • prompt (required)
  • strategy (optional) — default: balanced
  • current_model (optional) — e.g. GPT-4, Claude 3.5 Sonnet; used to compute savings_vs_current

Response (200)

Same shape as /api/route plus: demo_mode: true, savings_vs_current (if current_model was sent), rate_limit (remaining, limit). 429 when rate limit exceeded.

Error responses

All endpoints return JSON on error. Include an error field (string); many include an optional message with details.

400 Bad Request

  • Invalid JSON — Request body is not valid JSON.
  • prompt is required — Missing or empty prompt.
  • Invalid strategystrategy must be one of: lowest_cost, balanced, max_reliability, fastest.
  • Demo only: Prompt too long — Demo allows up to 800 input tokens; response includes message with limit details.

401 Unauthorized

/api/route, /api/recommend, /api/route/stream, /api/v1/chat/completions (demo does not require auth).

  • Missing Authorization header — Send Authorization: Bearer YOUR_JWT.
  • Invalid or expired token — JWT is invalid or expired; re-authenticate.

429 Too Many Requests

  • /api/route, /api/route/stream, /api/v1/chat/completions — Monthly routing limit reached. Body: error, message, current, limit. Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (Unix timestamp). Resets at end of month (UTC).
  • /api/route/demo — Demo rate limit (5 requests per IP per day). Body: error, message. Same rate-limit headers.

500 Internal Server Error

Server or routing failure. Body: error (e.g. "Routing failed", "Configuration error", "Demo execution failed"). Retry or contact support.

Getting your JWT

Get your API token

Sign in first, then click below to fetch and copy your JWT.