infermux

July 3, 2026 ยท View on GitHub

Inference router. Part of the MIST stack.

Go License: MIT

Install

go get github.com/greynewell/infermux

Provider interface

type Provider interface {
    Name() string
    Models() []string
    Infer(ctx context.Context, req protocol.InferRequest) (protocol.InferResponse, error)
}

Route

reg := infermux.NewRegistry()
reg.Register(myOpenAIProvider)
reg.Register(myAnthropicProvider)

reporter := tokentrace.NewReporter("infermux", "http://localhost:8700")
router := infermux.NewRouter(reg, reporter)

resp, err := router.Infer(ctx, protocol.InferRequest{
    Model:    "claude-sonnet-4-5-20250929",
    Messages: []protocol.ChatMessage{{Role: "user", Content: "Hello"}},
})

Tracks tokens and cost per request. Reports spans to TokenTrace.

HTTP API

handler := infermux.NewHandler(router, reg)
http.HandleFunc("POST /mist", handler.Ingest)
http.HandleFunc("POST /infer", handler.InferDirect)
http.HandleFunc("GET /providers", handler.Providers)

gRPC API

InferMux serves the same router over gRPC (infermux.v1.InferMuxService), defined in proto/infermux/v1/infermux.proto.

infermux serve-grpc --addr :8601 --tokentrace http://localhost:8700

The server ships with the standard gRPC health service, server reflection (works with grpcurl out of the box), keepalive enforcement, panic recovery, structured per-RPC logging, and graceful drain on SIGINT/SIGTERM.

Go client:

import "github.com/greynewell/infermux/grpcclient"

c, err := grpcclient.New("localhost:8601")
defer c.Close()

res, err := c.Prompt(ctx, "echo-v1", "Hello world")
// res.Content, res.Provider, res.TokensIn, res.TokensOut, res.CostUSD

The client retries UNAVAILABLE (transient provider failure) up to 3 attempts with exponential backoff via gRPC service config, and never retries NOT_FOUND or INVALID_ARGUMENT. Caller deadlines propagate through the server into provider calls.

Error contract:

ConditiongRPC code
Empty messages, bad role, temperature out of rangeINVALID_ARGUMENT
No provider for the requested modelNOT_FOUND
Resolved provider failed upstream (retryable)UNAVAILABLE
Caller deadline elapsedDEADLINE_EXCEEDED

Integration tests cover the full wire path (real TCP, real server, real client), including retry behavior, deadline propagation, error mapping, and health checks:

go test ./integration/ -race -v

Regenerate protobuf stubs:

protoc --proto_path=proto \
  --go_out=gen --go_opt=paths=source_relative \
  --go-grpc_out=gen --go-grpc_opt=paths=source_relative \
  proto/infermux/v1/infermux.proto

CLI

infermux serve --addr :8600 --tokentrace http://localhost:8700
infermux serve-grpc --addr :8601 --tokentrace http://localhost:8700
infermux infer --model echo-v1 --prompt "Hello world"