Smart LLM routing and cost control. OpenAI-compatible, your keys.
StepBlend sits between your app and AI providers, routing each request to the best model based on your strategy—lowest cost, fastest, or most reliable.
Your API keys. Your control. No token markup. Use like OpenAI: set base_url to your StepBlend API and your JWT as api_key.
Try it live
Supported for savings: GPT-4, GPT-4 Turbo, GPT-4.1, GPT-4.1 Mini…
Max 800 tokens (~3200 chars). Demo limited to cheaper models only.
Sign up to use your own API keys and get unlimited access
Try OptimizerInstant integration — OpenAI compatible
Use your existing OpenAI or LangChain code. No new wrappers — just change base URL and key.
OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
base_url="https://stepblend.com/api/v1",
api_key="YOUR_STEPBLEND_JWT"
)
r = client.chat.completions.create(
model="lowest-cost",
messages=[{"role": "user", "content": "Hi"}]
)LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
openai_api_base="https://stepblend.com/api/v1",
openai_api_key="YOUR_JWT",
model="balanced"
)
llm.invoke("Hello!")curl
curl -X POST https://stepblend.com/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_JWT" \
-H "Content-Type: application/json" \
-d '{"model":"balanced","messages":[{"role":"user","content":"Hi"}]}'The problem we solve
Most teams hardcode one model, guess at costs, and have no visibility into provider exposure.
You end up:
- Hardcode one model
- Guess at cost
- Ignore vendor exposure
- Lack spend visibility
- Have no governance layer
StepBlend adds a control layer: route intelligently, cap costs, track everything. Your keys, your infrastructure.
What you get
A routing layer that handles model selection, cost control, and observability.
- Real-time routing across providers
- Deterministic strategy enforcement
- Max cost caps per request
- Automatic failover
- Vendor exposure tracking
- Spend visibility dashboard
- Seamless drop-in for OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK
Plus a dashboard to see what's happening across all your AI calls.
How it works
Replace direct model calls with one endpoint.
- 1Send requests to /api/route
- 2StepBlend classifies and scores models
- 3Applies strategy + cost constraints
- 4Executes with your API keys
- 5Logs and surfaces metrics in dashboard
We never touch your API keys or resell tokens. Everything runs through your infrastructure.
Simple API, powerful routing
Use the OpenAI-compatible endpoint for drop-in compatibility, or the native API for advanced control. Both use the same routing engine, cost caps, logs, and dashboard.
Request
POST /api/v1/chat/completions
Content-Type: application/json
Authorization: Bearer <your_jwt>
{
"model": "lowest-cost",
"messages": [{"role": "user", "content": "Summarize this..."}],
"stepblend_max_cost": 0.01
}Response
{
"id": "...",
"choices": [{ "message": { "content": "Your summary..." } }],
"usage": { "prompt_tokens": 10, "completion_tokens": 20 },
"model": "gpt-4.1-mini"
}See what's happening
Track spend, model usage, and provider exposure across all your requests.
- Total AI spend
- Premium model usage %
- Provider exposure
- Fallback rate
- Latency metrics
Available for Growth and Scale plans.
Try it yourself
Test routing with your own prompts. See which model gets selected and why.
Open Optimizer→Trusted by developers and teams
7+
AI models routed
4
Routing strategies
0
Token resale
“Cut our AI spend by 40% without changing a line of code. The routing picks cheaper models when quality isn't critical.”
“One endpoint instead of managing five different provider SDKs. Our keys stay with us, which was a hard requirement.”
“We see cost estimates before each call and can cap spend per request. Makes budgeting predictable.”
Ready to add control to your AI calls?
Start routing intelligently today. Free tier includes 1,000 requests per month.