Get Tensormux running
Clone the repository and run with Docker Compose or install from source.
git clone https://github.com/KrxGu/Tensormux.git
cd Tensormux
docker compose up --buildgit clone https://github.com/KrxGu/Tensormux.git
cd Tensormux
pip install -e .Three steps to route inference
Create a config file, start the gateway, and point your OpenAI SDK at it.
Create config.yaml
gateway:
host: 0.0.0.0
port: 8080
strategy: least_inflight
backends:
- name: vllm-fast
url: http://vllm-fast:8000
engine: vllm
model: llama-3.1-8b
weight: 80
health_endpoint: /v1/models
tags: ["fast", "gpu-a10"]
- name: sglang-cheap
url: http://sglang-cheap:8000
engine: sglang
model: llama-3.1-8b
weight: 20
health_endpoint: /v1/models
tags: ["cheap", "gpu-t4"]
health:
interval_s: 5
timeout_s: 2
fail_threshold: 2
success_threshold: 1
logging:
level: info
jsonl_path: tensormux.jsonlStart the gateway
services:
tensormux:
build: .
ports:
- "8080:8080"
environment:
- TENSORMUX_CONFIG=/app/config.yaml
volumes:
- ./config.yaml:/app/config.yaml:roPoint your OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY ?? "not-used-for-oss-backends",
baseURL: "http://YOUR_TENSORMUX_HOST:8080/v1",
});Configuration overview
All fields supported by tensormux.yaml.
gateway.strategyRouting strategy for distributing requests across backends.
backends[].nameUnique name for the backend. Used in logs and metrics.
backends[].urlBase URL of the inference backend (e.g., http://vllm:8000).
backends[].engineInference engine type. Used for tagging only.
backends[].weightWeight for weighted round-robin routing. Higher values get more traffic.
backends[].health_endpointHTTP path used for health checks. Defaults to /v1/models.
backends[].tagsList of string tags for labeling and filtering (e.g., region, GPU tier).
health.interval_sSeconds between health check probes per backend.
health.fail_thresholdConsecutive failures before marking a backend unhealthy.
health.success_thresholdConsecutive successes before restoring a backend to healthy.
logging.levelLog verbosity level.
logging.jsonl_pathFile path for JSONL audit logs. Records every routed request with backend, latency, and status.
Full reference documentation
Source code, contributing guide, and full API docs are in the GitHub repository.