glide-mq

April 2, 2026 · View on GitHub

High-performance message queue for Node.js with first-class AI orchestration. Built on Valkey/Redis Streams with a Rust NAPI core.

Completes and fetches the next job in a single server-side function call (1 RTT per job), hash-tags every key for zero-config clustering, and ships AI-native primitives for cost tracking, token streaming, human-in-the-loop suspend/resume, model failover, TPM rate limiting, budget caps, rolling usage summaries, and vector search.

npm install glide-mq

General Usage

import { Queue, Worker } from 'glide-mq';

const connection = { addresses: [{ host: 'localhost', port: 6379 }] };
const queue = new Queue('tasks', { connection });

await queue.add('send-email', { to: 'user@example.com', subject: 'Welcome' });

const worker = new Worker(
  'tasks',
  async (job) => {
    await sendEmail(job.data.to, job.data.subject);
    return { sent: true };
  },
  { connection, concurrency: 10 },
);

AI Usage

import { Queue, Worker } from 'glide-mq';

const queue = new Queue('ai', { connection });

await queue.add(
  'inference',
  { prompt: 'Explain message queues' },
  {
    fallbacks: [{ model: 'gpt-5.4-nano', provider: 'openai' }],
    lockDuration: 120000,
  },
);

const worker = new Worker(
  'ai',
  async (job) => {
    const result = await callLLM(job.data.prompt);
    await job.reportUsage({
      model: 'gpt-5.4',
      tokens: { input: 50, output: 200 },
      costs: { total: 0.003 },
    });
    await job.stream({ type: 'token', content: result });
    return result;
  },
  { connection, tokenLimiter: { maxTokens: 100000, duration: 60000 } },
);

When to use glide-mq

Background jobs and task processing - email, image processing, data pipelines, webhooks, any async work.
Scheduled and recurring work - cron jobs, interval tasks, bounded schedulers.
Distributed workflows - parent-child trees, DAGs, fan-in/fan-out, step jobs, dynamic children.
High-throughput queues over real networks - 1 RTT per job via Valkey Server Functions, up to 38% faster than alternatives.
LLM pipelines and model orchestration - cost tracking, token streaming, model failover, budget caps without external middleware.
Valkey/Redis clusters - hash-tagged keys out of the box with zero configuration.

How it's different

Aspect	glide-mq
Network per job	1 RTT - complete + fetch next in a single FCALL
Client	Rust NAPI bindings via valkey-glide - no JS protocol parsing
Server logic	Persistent Valkey Function library (FUNCTION LOAD + FCALL) - no per-call EVAL
Cluster	Hash-tagged keys (`glide:{queueName}:*`) route to the same slot automatically
AI-native	Cost tracking, token streaming, suspend/resume, fallback chains, TPM limits, budget caps, usage summaries
Vector search	KNN similarity queries over job data via Valkey Search

AI-native primitives

Seven primitives for LLM and agent workflows, built into the core API.

Cost tracking - job.reportUsage() records model, tokens, cost, latency per job. queue.getFlowUsage() aggregates across flows and queue.getUsageSummary() rolls usage up across queues.
Token streaming - job.stream(chunk) pushes LLM output tokens in real time. queue.readStream(jobId) consumes them with optional long-polling.
Suspend/resume - job.suspend() pauses mid-processor for human approval or webhook callback. queue.signal(jobId, name, data) resumes with external input.
Fallback chains - ordered fallbacks array on job options. On failure, the next retry reads job.currentFallback for the alternate model/provider.
TPM rate limiting - tokenLimiter on worker options enforces tokens-per-minute caps. Combine with RPM limiter for dual-axis rate control.
Budget caps - FlowProducer.add(flow, { budget }) sets maxTotalTokens and maxTotalCost across all jobs in a flow. Jobs fail or pause when exceeded.
Per-job lock duration - override lockDuration per job for adaptive stall detection. Short for classifiers, long for multi-minute LLM calls.

See Usage - AI-native primitives for full examples.

Features

1 RTT per job - complete current + fetch next in a single server-side function call
Cluster-native - hash-tagged keys, zero cluster configuration
Workflows - FlowProducer trees, DAGs with fan-in, chain/group/chord, step jobs, dynamic children
Scheduling - 5-field cron with timezone, fixed intervals, bounded schedulers
Retries - exponential, fixed, or custom backoff with dead-letter queues
Rate limiting - per-group sliding window, token bucket, global queue-wide limits
Broadcast - fan-out pub/sub with NATS-style subject filtering and independent subscriber retries
Batch processing - process multiple jobs at once for bulk I/O
Request-reply - queue.addAndWait() for synchronous RPC patterns
Deduplication - simple, throttle, and debounce modes
Compression - transparent gzip at the queue level
Serverless - lightweight Producer and ServerlessPool for Lambda/Edge
OpenTelemetry - automatic span emission with bring-your-own tracer
In-memory testing - TestQueue and TestWorker with zero Valkey dependency
Cross-language - HTTP proxy with request-reply, SSE streams, schedulers, flow create/read/tree/delete, usage summaries, and broadcast fan-out for non-Node.js services

Performance

Benchmarked on AWS ElastiCache Valkey 8.2 (r7g.large) with TLS, EC2 client in the same region.

Concurrency	glide-mq	Leading Alternative	Delta
c=5	10,754 j/s	9,866 j/s	+9%
c=10	18,218 j/s	13,541 j/s	+35%
c=15	19,583 j/s	14,162 j/s	+38%
c=20	19,408 j/s	16,085 j/s	+21%

The advantage comes from completing and fetching the next job in a single FCALL. The savings compound over real network latency - exactly the conditions in every production deployment. At high concurrency both libraries converge toward the Valkey single-thread ceiling.

Reproduce with npm run bench or npx tsx benchmarks/elasticache-head-to-head.ts against your own infrastructure.

Examples

All examples live in glidemq-examples. Pick a directory, npm install, npx tsx <file>.ts.

AI Orchestration	Core Patterns	Framework Plugins
Usage & Costs	Basics	Hono
Budget & TPM	Workflows	Fastify
Streaming	Advanced	Hapi
Agent Loops	DAG Flows	NestJS
Search & Vectors	Scheduling	Express
SDK Integrations	Batch Processing	Next.js

When NOT to use glide-mq

You need a log-based event streaming platform. glide-mq is a job/task queue, not a partitioned event log. It does not provide Kafka-style topic partitions, consumer offset management, or event replay.
You need browser support. The Rust NAPI client requires a server-side runtime (Node.js 20+, Bun, or Deno with NAPI support).
You need exactly-once semantics. glide-mq provides at-least-once delivery. Duplicate processing is rare but possible - design processors to be idempotent.
You need to run without Valkey or Redis. Production use requires Valkey 7.0+ or Redis 7.0+. For dev/testing, TestQueue/TestWorker run fully in-memory.

Documentation

Guide	Topics
Usage	Queue, Worker, Producer, request-reply, SSE, proxy, flow HTTP API, usage summaries
Workflows	FlowProducer, DAG, chain/group/chord, dynamic children
Advanced	Schedulers, rate limiting, dedup, compression, retries, DLQ
Broadcast	Pub/sub fan-out, subject filtering
Observability	OpenTelemetry, metrics, job logs, dashboard
Serverless	Producer, ServerlessPool, Lambda/Edge, HTTP proxy
Testing	In-memory TestQueue and TestWorker
Wire Protocol	Cross-language FCALL specs, Python/Go examples
Step Jobs	Step-job workflows with moveToDelayed
Durability	Durability guarantees, persistence, delivery semantics
Architecture	Internal architecture and design reference
Migration	API mapping guide for migrating from other queues

Ecosystem

Package	Description
@glidemq/speedkey	Valkey GLIDE client with native NAPI bindings
@glidemq/dashboard	Web UI for metrics, schedulers, job mutations
@glidemq/hono	Hono middleware
@glidemq/fastify	Fastify plugin
@glidemq/nestjs	NestJS module
@glidemq/hapi	Hapi plugin
glidemq.dev	Full documentation site

Contributing

Bug reports, feature requests, and pull requests are welcome.

License

Apache-2.0