Documentation
Get started with Tensormux in minutes.
Install
Clone the repository and run with Docker Compose or install from source.
git clone https://github.com/KrxGu/Tensormux.git
cd Tensormux
docker compose up --buildgit clone https://github.com/KrxGu/Tensormux.git
cd Tensormux
pip install -e .Quickstart
Create a config file, start the gateway, and point your OpenAI SDK at it.
1. Create config.yaml
gateway:
host: 0.0.0.0
port: 8080
strategy: least_inflight
backends:
- name: vllm-fast
url: http://vllm-fast:8000
engine: vllm
model: llama-3.1-8b
weight: 80
health_endpoint: /v1/models
tags: ["fast", "gpu-a10"]
- name: sglang-cheap
url: http://sglang-cheap:8000
engine: sglang
model: llama-3.1-8b
weight: 20
health_endpoint: /v1/models
tags: ["cheap", "gpu-t4"]
health:
interval_s: 5
timeout_s: 2
fail_threshold: 2
success_threshold: 1
logging:
level: info
jsonl_path: tensormux.jsonl2. Start the gateway
services:
tensormux:
build: .
ports:
- "8080:8080"
environment:
- TENSORMUX_CONFIG=/app/config.yaml
volumes:
- ./config.yaml:/app/config.yaml:ro3. Point your OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY ?? "not-used-for-oss-backends",
baseURL: "http://YOUR_TENSORMUX_HOST:8080/v1",
});Configuration overview
gateway.strategyRouting strategy for distributing requests across backends.
backends[].nameUnique name for the backend. Used in logs and metrics.
backends[].urlBase URL of the inference backend (e.g., http://vllm:8000).
backends[].engineInference engine type. Used for tagging only.
backends[].weightWeight for weighted round-robin routing. Higher values receive more traffic.
backends[].health_endpointHTTP path used for health checks on this backend. Defaults to /v1/models.
backends[].tagsList of string tags for labeling and filtering backends (e.g., region, GPU tier).
health.interval_sSeconds between health check probes for each backend.
health.fail_thresholdNumber of consecutive failures before marking a backend unhealthy.
health.success_thresholdNumber of consecutive successes before marking an unhealthy backend healthy again.
logging.levelLog verbosity level.
logging.jsonl_pathFile path for JSONL audit logs. Logs all routed requests with backend, latency, and status.
Full reference documentation is available in the GitHub repository.