Documentation
Get started with Tensormux in minutes.
Install
Pull the Docker image or build from source.
docker pull ghcr.io/krxgu/tensormux:latestgit clone https://github.com/KrxGu/Tensormux.git
cd Tensormux
go build -o tensormux ./cmd/tensormuxQuickstart
Create a config file, start the gateway, and point your OpenAI SDK at it.
1. Create tensormux.yaml
gateway:
host: 0.0.0.0
port: 8080
strategy: least_inflight
backends:
- name: vllm-fast
url: http://vllm-fast:8000
engine: vllm
model: llama-3.1-8b
weight: 80
- name: sglang-cheap
url: http://sglang-cheap:8000
engine: sglang
model: llama-3.1-8b
weight: 20
health_check:
interval_s: 5
timeout_s: 2
fail_threshold: 2
success_threshold: 12. Start the gateway
services:
tensormux:
image: ghcr.io/krxgu/tensormux:latest
ports:
- "8080:8080"
volumes:
- ./tensormux.yaml:/app/tensormux.yaml:ro
command: ["--config", "/app/tensormux.yaml"]3. Point your OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY ?? "not-used-for-oss-backends",
baseURL: "http://YOUR_TENSORMUX_HOST:8080/v1",
});Configuration overview
gateway.strategyRouting strategy for distributing requests across backends.
backends[].nameUnique name for the backend. Used in logs and metrics.
backends[].urlBase URL of the inference backend (e.g., http://vllm:8000).
backends[].engineInference engine type. Used for tagging only.
backends[].weightWeight for weighted round-robin routing. Higher values receive more traffic.
health_check.interval_sSeconds between health check probes for each backend.
health_check.fail_thresholdNumber of consecutive failures before marking a backend unhealthy.
health_check.success_thresholdNumber of consecutive successes before marking an unhealthy backend healthy again.
Full reference documentation is available in the GitHub repository.