One OpenAI endpoint. Many inference backends.
Route across vLLM, SGLang, TensorRT-LLM with health checks, failover, and observability. Drop-in replacement for your OpenAI base URL.
Inference gets messy fast.
Scaling inference across multiple backends introduces complexity that slows teams down.
What Tensormux does
One gateway for routing, reliability, and observability across your inference fleet.
How it works
Three steps to unified inference routing.
Deploy Tensormux in front of your backends
Run Tensormux as a Docker container or binary. Point it at your inference backends via a simple YAML config.
Point your app to Tensormux
Change the OpenAI SDK base_url to your Tensormux host. No code changes beyond the URL.
Tensormux routes and exposes metrics
Requests are routed based on your chosen policy. Prometheus metrics, health status, and audit logs are available instantly.
Integration
See how Tensormux simplifies your inference stack.
Without Tensormux
Bespoke routing logic, manual failover, fragmented metrics.
gateway:
host: 0.0.0.0
port: 8080
strategy: least_inflight
backends:
- name: vllm-fast
url: http://vllm-fast:8000
engine: vllm
model: llama-3.1-8b
weight: 80
tags: ["fast", "gpu-a10"]
- name: sglang-cheap
url: http://sglang-cheap:8000
engine: sglang
model: llama-3.1-8b
weight: 20
tags: ["cheap", "gpu-t4"]
health_check:
interval_s: 5
timeout_s: 2
fail_threshold: 2
success_threshold: 1
logging:
level: info
file: tensormux.jsonlOSS vs Paid
The open-source gateway covers production routing. A managed console is planned for teams that need more.
Open Source
Available now- Routing policies (least inflight, EWMA, weighted round-robin)
- Health checking and automatic failover
- OpenAI-compatible API passthrough
- SSE streaming support
- Prometheus metrics endpoint
- Status and health endpoints
- YAML-based configuration
- Audit logging (JSONL)
Managed Console
Preview- Multi-tenant dashboards
- SLO monitoring and alerting
- Policy management UI
- Audit trails with search
- RBAC and team access controls
- Config rollout workflows
Ready to simplify inference routing?
Deploy Tensormux in minutes. One config file, one endpoint, full control.