Back
Playground
Simulation
Fast + Slow (cost tiers)
Multi-region (us-east + eu-west)
Mixed engines (vLLM + TRT)
Simulation. No real backends. No data leaves your browser.
tensormux.yaml
Apply
gateway: host: 0.0.0.0 port: 8080 strategy: least_inflight backends: - name: vllm-fast url: http://vllm-fast:8000 engine: vllm model: llama-3.1-8b weight: 80 tags: - fast - gpu-a10 base_latency_ms: 50 jitter_ms: 15 capacity: 32 - name: sglang-cheap url: http://sglang-cheap:8000 engine: sglang model: llama-3.1-8b weight: 20 tags: - cheap - gpu-t4 base_latency_ms: 120 jitter_ms: 30 capacity: 16 health_check: interval_s: 5 timeout_s: 2 fail_threshold: 2 success_threshold: 1
Strategy:
Least Inflight
EWMA Latency
Weighted RR
Send request
Burst 10
Auto
stream=false
EWMA Latency
Distribution
Recent Requests
Explain Decision
Backends