OpenAI-compatible API

Use StepBlend with the same request and response shape as OpenAI Chat Completions. Works with LangChain, the OpenAI SDK, LlamaIndex, Vercel AI SDK, and any client that supports a custom base_url. Your API keys stay in StepBlend; you send your StepBlend JWT as api_key.

Endpoint

POST https://stepblend.com/api/v1/chat/completions

Base URL for clients: https://stepblend.com/api/v1 (no trailing slash). The client will append /chat/completions.

Authentication

Send your StepBlend JWT in the Authorization header: Bearer YOUR_JWT. Same token as for the custom Routing API. You need connected provider keys at /account/keys.

Get your API token

Request body

Same as OpenAI Chat Completions: model, messages, stream, max_tokens, temperature, tools, tool_choice, etc.

The `model` field

Strategy: lowest-cost, balanced, fastest, max-reliability — StepBlend picks the best model for that strategy (and optional cost cap).
Specific model: gpt-4o, openai:gpt-4o, claude-3-5-sonnet-20241022, anthropic:claude-3-5-sonnet-20241022, etc. — Routes to that model only (force model).

Optional StepBlend params

Add these to the root JSON body (OpenAI clients ignore unknown fields):

stepblend_max_cost — Max USD per request (e.g. 0.01). Only models under this cap are considered.
stepblend_strategy — Override strategy: lowest_cost | balanced | max_reliability | fastest.

Tool calling (function calling)

When you send tools (and optionally tool_choice), you must set model to a specific model, not a strategy. Supported providers: OpenAI, Groq, DeepSeek (all use OpenAI-shaped tool API). Example: model: "gpt-4o" or model: "deepseek-chat". The request is sent to that provider with your stored key; response usage is from the provider.

Response

Same shape as OpenAI: id, choices[0].message.content, usage (prompt_tokens, completion_tokens, total_tokens), and for tool calls choices[0].message.tool_calls. We add an extra root field:

stepblend — routed_model, provider, strategy_used, estimated_cost_usd, actual_cost_usd (when available).

Streaming: Set stream: true. Response is Server-Sent Events: data: {...} chunks (OpenAI format), then data: [DONE].

Examples

OpenAI Python SDK

from openai import OpenAI
client = OpenAI(
  api_key="YOUR_STEPBLEND_JWT",
  base_url="https://stepblend.com/api/v1"
)
response = client.chat.completions.create(
  model="lowest-cost",
  messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

LangChain (ChatOpenAI)

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
  openai_api_key="YOUR_STEPBLEND_JWT",
  openai_api_base="https://stepblend.com/api/v1",
  model="balanced",
  temperature=0.7
)
response = llm.invoke("Hello!")

curl

curl -X POST https://stepblend.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{"model": "balanced", "messages": [{"role": "user", "content": "Hello!"}]}'

Rate limits and errors

Same as the rest of the routing API: monthly request limit per plan (Free 1k, Starter 50k, etc.). When exceeded, you get 429 with rate_limit_error. 401 for missing or invalid JWT. See Routing API Reference for full error details.

Custom API (/api/route)Try Optimizer