Get Tensormux running
Clone the repository and run with Docker Compose or install from source.
git clone https://github.com/KrxGu/Tensormux.gitcd Tensormuxdocker compose up --buildgit clone https://github.com/KrxGu/Tensormux.gitcd Tensormuxpip install -e .Three steps to route inference
Create a config file, start the gateway, and point your OpenAI SDK at it.
Create config.yaml
gateway: host: 0.0.0.0 port: 8080 strategy: least_inflight backends: - name: vllm-fast url: http://vllm-fast:8000 engine: vllm model: llama-3.1-8b weight: 80 health_endpoint: /v1/models tags: ["fast", "gpu-a10"] - name: sglang-cheap url: http://sglang-cheap:8000 engine: sglang model: llama-3.1-8b weight: 20 health_endpoint: /v1/models tags: ["cheap", "gpu-t4"] health: interval_s: 5 timeout_s: 2 fail_threshold: 2 success_threshold: 1 logging: level: info jsonl_path: tensormux.jsonlStart the gateway
services: tensormux: build: . ports: - "8080:8080" environment: - TENSORMUX_CONFIG=/app/config.yaml volumes: - ./config.yaml:/app/config.yaml:roPoint your OpenAI SDK
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY ?? "not-used-for-oss-backends", baseURL: "http://YOUR_TENSORMUX_HOST:8080/v1",});Configuration overview
All fields supported by tensormux.yaml.
gateway.strategyRouting strategy for distributing requests across backends.
backends[].nameUnique name for the backend. Used in logs and metrics.
backends[].urlBase URL of the inference backend (e.g., http://vllm:8000).
backends[].engineInference engine type. Used for tagging only.
backends[].weightWeight for weighted round-robin routing. Higher values get more traffic.
backends[].health_endpointHTTP path used for health checks. Defaults to /v1/models.
backends[].tagsList of string tags for labeling and filtering (e.g., region, GPU tier).
health.interval_sSeconds between health check probes per backend.
health.fail_thresholdConsecutive failures before marking a backend unhealthy.
health.success_thresholdConsecutive successes before restoring a backend to healthy.
logging.levelLog verbosity level.
logging.jsonl_pathFile path for JSONL audit logs. Records every routed request with backend, latency, and status.
Full reference documentation
Source code, contributing guide, and full API docs are in the GitHub repository.