glide-mq

April 2, 2026 ยท View on GitHub

npm version license CI

High-performance message queue for Node.js with first-class AI orchestration. Built on Valkey/Redis Streams with a Rust NAPI core.

Completes and fetches the next job in a single server-side function call (1 RTT per job), hash-tags every key for zero-config clustering, and ships AI-native primitives for cost tracking, token streaming, human-in-the-loop suspend/resume, model failover, TPM rate limiting, budget caps, rolling usage summaries, and vector search.

npm install glide-mq

General Usage

import { Queue, Worker } from 'glide-mq';

const connection = { addresses: [{ host: 'localhost', port: 6379 }] };
const queue = new Queue('tasks', { connection });

await queue.add('send-email', { to: 'user@example.com', subject: 'Welcome' });

const worker = new Worker(
  'tasks',
  async (job) => {
    await sendEmail(job.data.to, job.data.subject);
    return { sent: true };
  },
  { connection, concurrency: 10 },
);

AI Usage

import { Queue, Worker } from 'glide-mq';

const queue = new Queue('ai', { connection });

await queue.add(
  'inference',
  { prompt: 'Explain message queues' },
  {
    fallbacks: [{ model: 'gpt-5.4-nano', provider: 'openai' }],
    lockDuration: 120000,
  },
);

const worker = new Worker(
  'ai',
  async (job) => {
    const result = await callLLM(job.data.prompt);
    await job.reportUsage({
      model: 'gpt-5.4',
      tokens: { input: 50, output: 200 },
      costs: { total: 0.003 },
    });
    await job.stream({ type: 'token', content: result });
    return result;
  },
  { connection, tokenLimiter: { maxTokens: 100000, duration: 60000 } },
);

When to use glide-mq

  • Background jobs and task processing - email, image processing, data pipelines, webhooks, any async work.
  • Scheduled and recurring work - cron jobs, interval tasks, bounded schedulers.
  • Distributed workflows - parent-child trees, DAGs, fan-in/fan-out, step jobs, dynamic children.
  • High-throughput queues over real networks - 1 RTT per job via Valkey Server Functions, up to 38% faster than alternatives.
  • LLM pipelines and model orchestration - cost tracking, token streaming, model failover, budget caps without external middleware.
  • Valkey/Redis clusters - hash-tagged keys out of the box with zero configuration.

How it's different

Aspectglide-mq
Network per job1 RTT - complete + fetch next in a single FCALL
ClientRust NAPI bindings via valkey-glide - no JS protocol parsing
Server logicPersistent Valkey Function library (FUNCTION LOAD + FCALL) - no per-call EVAL
ClusterHash-tagged keys (glide:{queueName}:*) route to the same slot automatically
AI-nativeCost tracking, token streaming, suspend/resume, fallback chains, TPM limits, budget caps, usage summaries
Vector searchKNN similarity queries over job data via Valkey Search

AI-native primitives

Seven primitives for LLM and agent workflows, built into the core API.

  • Cost tracking - job.reportUsage() records model, tokens, cost, latency per job. queue.getFlowUsage() aggregates across flows and queue.getUsageSummary() rolls usage up across queues.
  • Token streaming - job.stream(chunk) pushes LLM output tokens in real time. queue.readStream(jobId) consumes them with optional long-polling.
  • Suspend/resume - job.suspend() pauses mid-processor for human approval or webhook callback. queue.signal(jobId, name, data) resumes with external input.
  • Fallback chains - ordered fallbacks array on job options. On failure, the next retry reads job.currentFallback for the alternate model/provider.
  • TPM rate limiting - tokenLimiter on worker options enforces tokens-per-minute caps. Combine with RPM limiter for dual-axis rate control.
  • Budget caps - FlowProducer.add(flow, { budget }) sets maxTotalTokens and maxTotalCost across all jobs in a flow. Jobs fail or pause when exceeded.
  • Per-job lock duration - override lockDuration per job for adaptive stall detection. Short for classifiers, long for multi-minute LLM calls.

See Usage - AI-native primitives for full examples.

Features

  • 1 RTT per job - complete current + fetch next in a single server-side function call
  • Cluster-native - hash-tagged keys, zero cluster configuration
  • Workflows - FlowProducer trees, DAGs with fan-in, chain/group/chord, step jobs, dynamic children
  • Scheduling - 5-field cron with timezone, fixed intervals, bounded schedulers
  • Retries - exponential, fixed, or custom backoff with dead-letter queues
  • Rate limiting - per-group sliding window, token bucket, global queue-wide limits
  • Broadcast - fan-out pub/sub with NATS-style subject filtering and independent subscriber retries
  • Batch processing - process multiple jobs at once for bulk I/O
  • Request-reply - queue.addAndWait() for synchronous RPC patterns
  • Deduplication - simple, throttle, and debounce modes
  • Compression - transparent gzip at the queue level
  • Serverless - lightweight Producer and ServerlessPool for Lambda/Edge
  • OpenTelemetry - automatic span emission with bring-your-own tracer
  • In-memory testing - TestQueue and TestWorker with zero Valkey dependency
  • Cross-language - HTTP proxy with request-reply, SSE streams, schedulers, flow create/read/tree/delete, usage summaries, and broadcast fan-out for non-Node.js services

Performance

Benchmarked on AWS ElastiCache Valkey 8.2 (r7g.large) with TLS, EC2 client in the same region.

Concurrencyglide-mqLeading AlternativeDelta
c=510,754 j/s9,866 j/s+9%
c=1018,218 j/s13,541 j/s+35%
c=1519,583 j/s14,162 j/s+38%
c=2019,408 j/s16,085 j/s+21%

The advantage comes from completing and fetching the next job in a single FCALL. The savings compound over real network latency - exactly the conditions in every production deployment. At high concurrency both libraries converge toward the Valkey single-thread ceiling.

Reproduce with npm run bench or npx tsx benchmarks/elasticache-head-to-head.ts against your own infrastructure.

Examples

All examples live in glidemq-examples. Pick a directory, npm install, npx tsx <file>.ts.

AI OrchestrationCore PatternsFramework Plugins
Usage & CostsBasicsHono
Budget & TPMWorkflowsFastify
StreamingAdvancedHapi
Agent LoopsDAG FlowsNestJS
Search & VectorsSchedulingExpress
SDK IntegrationsBatch ProcessingNext.js

When NOT to use glide-mq

  • You need a log-based event streaming platform. glide-mq is a job/task queue, not a partitioned event log. It does not provide Kafka-style topic partitions, consumer offset management, or event replay.
  • You need browser support. The Rust NAPI client requires a server-side runtime (Node.js 20+, Bun, or Deno with NAPI support).
  • You need exactly-once semantics. glide-mq provides at-least-once delivery. Duplicate processing is rare but possible - design processors to be idempotent.
  • You need to run without Valkey or Redis. Production use requires Valkey 7.0+ or Redis 7.0+. For dev/testing, TestQueue/TestWorker run fully in-memory.

Documentation

GuideTopics
UsageQueue, Worker, Producer, request-reply, SSE, proxy, flow HTTP API, usage summaries
WorkflowsFlowProducer, DAG, chain/group/chord, dynamic children
AdvancedSchedulers, rate limiting, dedup, compression, retries, DLQ
BroadcastPub/sub fan-out, subject filtering
ObservabilityOpenTelemetry, metrics, job logs, dashboard
ServerlessProducer, ServerlessPool, Lambda/Edge, HTTP proxy
TestingIn-memory TestQueue and TestWorker
Wire ProtocolCross-language FCALL specs, Python/Go examples
Step JobsStep-job workflows with moveToDelayed
DurabilityDurability guarantees, persistence, delivery semantics
ArchitectureInternal architecture and design reference
MigrationAPI mapping guide for migrating from other queues

Ecosystem

PackageDescription
@glidemq/speedkeyValkey GLIDE client with native NAPI bindings
@glidemq/dashboardWeb UI for metrics, schedulers, job mutations
@glidemq/honoHono middleware
@glidemq/fastifyFastify plugin
@glidemq/nestjsNestJS module
@glidemq/hapiHapi plugin
glidemq.devFull documentation site

Contributing

Bug reports, feature requests, and pull requests are welcome.

License

Apache-2.0