Smart LLM routing and cost control. OpenAI-compatible, your keys.

StepBlend sits between your app and AI providers, routing each request to the best model based on your strategy—lowest cost, fastest, or most reliable.

Your API keys. Your control. No token markup. Use like OpenAI: set base_url to your StepBlend API and your JWT as api_key.

Try the Optimizer→View Docs→

Try it live

Demo mode — limited models and usage

What model are you currently using?

Supported for savings: GPT-4, GPT-4 Turbo, GPT-4.1, GPT-4.1 Mini…

Enter your production prompt

Max 800 tokens (~3200 chars). Demo limited to cheaper models only.

Routing Strategy

Try Optimizer

Instant integration — OpenAI compatible

Use your existing OpenAI or LangChain code. No new wrappers — just change base URL and key.

OpenAI Python SDK

from openai import OpenAI
client = OpenAI(
  base_url="https://stepblend.com/api/v1",
  api_key="YOUR_STEPBLEND_JWT"
)
r = client.chat.completions.create(
  model="lowest-cost",
  messages=[{"role": "user", "content": "Hi"}]
)

LangChain

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
  openai_api_base="https://stepblend.com/api/v1",
  openai_api_key="YOUR_JWT",
  model="balanced"
)
llm.invoke("Hello!")

curl

curl -X POST https://stepblend.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{"model":"balanced","messages":[{"role":"user","content":"Hi"}]}'

See full OpenAI-compatible docs →

The problem we solve

Most teams hardcode one model, guess at costs, and have no visibility into provider exposure.

You end up:

Hardcode one model
Guess at cost
Ignore vendor exposure
Lack spend visibility
Have no governance layer

StepBlend adds a control layer: route intelligently, cap costs, track everything. Your keys, your infrastructure.

What you get

A routing layer that handles model selection, cost control, and observability.

Real-time routing across providers
Deterministic strategy enforcement
Max cost caps per request
Automatic failover
Vendor exposure tracking
Spend visibility dashboard
Seamless drop-in for OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK

Plus a dashboard to see what's happening across all your AI calls.

How it works

Replace direct model calls with one endpoint.

1Send requests to /api/route
2StepBlend classifies and scores models
3Applies strategy + cost constraints
4Executes with your API keys
5Logs and surfaces metrics in dashboard

We never touch your API keys or resell tokens. Everything runs through your infrastructure.

Simple API, powerful routing

Use the OpenAI-compatible endpoint for drop-in compatibility, or the native API for advanced control. Both use the same routing engine, cost caps, logs, and dashboard.

Request

POST /api/v1/chat/completions
Content-Type: application/json
Authorization: Bearer <your_jwt>

{
  "model": "lowest-cost",
  "messages": [{"role": "user", "content": "Summarize this..."}],
  "stepblend_max_cost": 0.01
}

Response

{
  "id": "...",
  "choices": [{ "message": { "content": "Your summary..." } }],
  "usage": { "prompt_tokens": 10, "completion_tokens": 20 },
  "model": "gpt-4.1-mini"
}

OpenAI-compatible API docs →

See what's happening

Track spend, model usage, and provider exposure across all your requests.

Total AI spend
Premium model usage %
Provider exposure
Fallback rate
Latency metrics

Available for Growth and Scale plans.

Open Control Center→

Try it yourself

Test routing with your own prompts. See which model gets selected and why.

Open Optimizer→

AI models routed

Routing strategies

Token resale

“Cut our AI spend by 40% without changing a line of code. The routing picks cheaper models when quality isn't critical.”
Sarah K.
Engineering Lead

“One endpoint instead of managing five different provider SDKs. Our keys stay with us, which was a hard requirement.”
Marcus T.
CTO

“We see cost estimates before each call and can cap spend per request. Makes budgeting predictable.”
Jen L.
Product Manager

Ready to add control to your AI calls?

Start routing intelligently today. Free tier includes 1,000 requests per month.

Request Access→View Docs→

Your API keys onlyNo token resaleAPI-first integration