OpenAI-compatible API
Use StepBlend with the same request and response shape as OpenAI Chat Completions. Works with LangChain, the OpenAI SDK, LlamaIndex, Vercel AI SDK, and any client that supports a custom base_url. Your API keys stay in StepBlend; you send your StepBlend JWT as api_key.
Endpoint
POST https://stepblend.com/api/v1/chat/completionsBase URL for clients: https://stepblend.com/api/v1 (no trailing slash). The client will append /chat/completions.
Authentication
Send your StepBlend JWT in the Authorization header: Bearer YOUR_JWT. Same token as for the custom Routing API. You need connected provider keys at /account/keys.
Get your API token
Sign in first, then click below to fetch and copy your JWT.
Request body
Same as OpenAI Chat Completions: model, messages, stream, max_tokens, temperature, tools, tool_choice, etc.
The model field
- Strategy:
lowest-cost,balanced,fastest,max-reliability— StepBlend picks the best model for that strategy (and optional cost cap). - Specific model:
gpt-4o,openai:gpt-4o,claude-3-5-sonnet-20241022,anthropic:claude-3-5-sonnet-20241022, etc. — Routes to that model only (force model).
Optional StepBlend params
Add these to the root JSON body (OpenAI clients ignore unknown fields):
stepblend_max_cost— Max USD per request (e.g.0.01). Only models under this cap are considered.stepblend_strategy— Override strategy:lowest_cost|balanced|max_reliability|fastest.
Tool calling (function calling)
When you send tools (and optionally tool_choice), you must set model to a specific model, not a strategy. Supported providers: OpenAI, Groq, DeepSeek (all use OpenAI-shaped tool API). Example: model: "gpt-4o" or model: "deepseek-chat". The request is sent to that provider with your stored key; response usage is from the provider.
Response
Same shape as OpenAI: id, choices[0].message.content, usage (prompt_tokens, completion_tokens, total_tokens), and for tool calls choices[0].message.tool_calls. We add an extra root field:
stepblend—routed_model,provider,strategy_used,estimated_cost_usd,actual_cost_usd(when available).
Streaming: Set stream: true. Response is Server-Sent Events: data: {...} chunks (OpenAI format), then data: [DONE].
Examples
OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
api_key="YOUR_STEPBLEND_JWT",
base_url="https://stepblend.com/api/v1"
)
response = client.chat.completions.create(
model="lowest-cost",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)LangChain (ChatOpenAI)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
openai_api_key="YOUR_STEPBLEND_JWT",
openai_api_base="https://stepblend.com/api/v1",
model="balanced",
temperature=0.7
)
response = llm.invoke("Hello!")curl
curl -X POST https://stepblend.com/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_JWT" \
-H "Content-Type: application/json" \
-d '{"model": "balanced", "messages": [{"role": "user", "content": "Hello!"}]}'Rate limits and errors
Same as the rest of the routing API: monthly request limit per plan (Free 1k, Starter 50k, etc.). When exceeded, you get 429 with rate_limit_error. 401 for missing or invalid JWT. See Routing API Reference for full error details.