Multi-Tenant LLM Routing: Model Override and Per-Tenant Control
When you serve multiple tenants or customers from one app, you often need per-tenant LLM behavior: one customer wants only Claude; another should get the cheapest model; a third gets “best quality” with a cost cap. A routing layer can do all of this with one endpoint and per-request parameters like force model and max cost.
The use case
- Tenant A (enterprise): “Always use Claude for our account.” Compliance or preference.
- Tenant B (startup): “Use the lowest-cost model that’s good enough.” Cost-sensitive.
- Tenant C (default): “Use balanced strategy with a $0.02 cap.” Safe default.
Without a router, you’d branch in code: if tenant A then call Anthropic; if B then pick the cheapest provider; if C then… That gets messy. With a router, you send the tenant’s preferences with each request (e.g. force_model, strategy, max_cost). Same endpoint, different behavior.
Force model (model override)
Force model means: “Ignore strategy for this request and call this exact model.” Use it when:
- A tenant has a contract or preference for one provider (e.g. “We only use Claude”).
- You’re debugging or reproducing an issue with a specific model.
- You’re A/B testing and want to send a fixed share of traffic to one model.
The router still uses your API keys, logs the request, and returns actual cost. You get deterministic model choice without separate code paths per provider. See the routing API for how to pass force_model (or equivalent) per request.
Per-tenant strategy and cost cap
You can also vary strategy and max cost by tenant:
- Enterprise:
strategy: "max_reliability",max_cost: 0.10. - SMB:
strategy: "balanced",max_cost: 0.02. - Free tier:
strategy: "lowest_cost",max_cost: 0.005.
Store the tenant’s preferences in your DB or config; when you call the router, pass those values in the request body. One integration, many tenant profiles. That’s how you get LLM cost control and flexibility without separate pipelines per tenant.
One endpoint, full visibility
All requests go through the same routing API. Logs and a Control Center show which tenant (if you tag requests), which model ran, and what it cost. So you can audit per-tenant spend and enforce caps or quotas in one place.
StepBlend supports force model, per-request strategy, and cost caps—so you can implement multi-tenant and model-override use cases without maintaining multiple provider integrations. Try the Optimizer → or check pricing and the docs.
Ready to add control to your AI calls?
Route through one endpoint. Set cost caps, pick strategies, and see spend—your API keys, no token resale.
Try the OptimizerRelated posts

LLM Cost Control: How to Cap and Reduce AI API Spend
Once you have real LLM usage, cost and visibility keep you up at night. Here's how to cap spend per request, see what actually ran, and route between models without the bill shock.

Multi-Model Routing: One API for OpenAI, Anthropic, and Google
Why use a single routing endpoint instead of calling each provider directly? Vendor neutrality, cost control, and one integration for GPT-4, Claude, Gemini, and more.