Replicate Provider Guide

May 17, 2026 · View on GitHub

One auth token, five modalities — LLMs + image + video + avatar + music under a single REPLICATE_API_TOKEN


Overview

Replicate is a universal hosted-model gateway. NeuroLink wraps it as a multi-modal provider so a single token gets you:

ModalityHowDefault model
LLMprovider: "replicate" chat / streamingmeta/meta-llama-3.1-70b-instruct
Image genprovider: "replicate" with a model id matching IMAGE_GENERATION_MODELSblack-forest-labs/flux-1.1-pro
Videooutput: { mode: "video", video: { provider: "replicate" } }atonamy/wan-alpha
Avataroutput: { mode: "avatar", avatar: { provider: "replicate" } }lucataco/musetalk
Musicoutput: { mode: "music", music: { provider: "replicate" } }meta/musicgen

Architectural detail: see docs/provider-integration/22-adding-multimodal-provider.md — Replicate is the canonical worked example.

Key Facts

  • Protocol: Async prediction lifecycle — POST /v1/predictions → poll until succeeded → fetch output. NeuroLink uses Prefer: wait=60 so short jobs complete in the initial POST and skip polling entirely.
  • Default base URL: https://api.replicate.com
  • Auth: Authorization: Token $REPLICATE_API_TOKEN
  • Pricing: Per compute-second (not per-token) — NeuroLink reports a symbolic per-token rate so cost dashboards stay populated, but real billing is via Replicate's invoice
  • Streaming: Synthetic single-chunk stream from the predict result (true SSE streaming planned for a follow-up)
  • Tool calling: Not supported — Replicate predictions are stateless
  • Reasoning trace: Model-dependent (e.g., DeepSeek R1 on Replicate exposes its reasoning trace in the output array)

Quick Start

1. Get an API Token

Sign up at https://replicate.com/ and create an API token at https://replicate.com/account/api-tokens.

2. Configure Environment

# Required
REPLICATE_API_TOKEN=r8_...

# Optional: override the default LLM model
REPLICATE_MODEL=meta/meta-llama-3.1-70b-instruct

# Optional: override the base URL
# REPLICATE_BASE_URL=https://api.replicate.com

3. Generate Your First Response

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
  provider: "replicate",
  input: { text: "Explain how a transformer's attention mechanism works." },
});

console.log(result.content);

SDK Usage by Modality

LLM (chat / streaming)

const result = await ai.generate({
  provider: "replicate",
  model: "meta/meta-llama-3.1-405b-instruct",
  input: { text: "Write Python that calculates compound interest." },
});

Streaming:

const stream = await ai.stream({
  provider: "replicate",
  model: "meta/meta-llama-3.1-70b-instruct",
  input: { text: "Tell me a story" },
});
for await (const chunk of stream.stream) {
  if ("content" in chunk) process.stdout.write(chunk.content);
}

Image Generation

const result = await ai.generate({
  provider: "replicate",
  model: "black-forest-labs/flux-1.1-pro",
  input: { text: "A serene mountain lake at sunrise, photorealistic" },
});
const buffer = Buffer.from(result.imageOutput.base64, "base64");
require("fs").writeFileSync("./output.png", buffer);

Other supported image models on Replicate (pass via model:):

  • black-forest-labs/flux-1.1-pro (default)
  • black-forest-labs/flux-schnell
  • stability-ai/stable-diffusion-3.5-large
  • stability-ai/stable-diffusion-3.5-large-turbo
  • playgroundai/playground-v2.5-1024px-aesthetic
  • ideogram-ai/ideogram-v3

Video Generation

import { readFileSync } from "node:fs";

const sourceImage = readFileSync("./input.jpg");

const result = await ai.generate({
  input: { text: "smooth zoom out", images: [sourceImage] },
  output: {
    mode: "video",
    video: {
      provider: "replicate",
      model: "atonamy/wan-alpha",
      length: 4,
      aspectRatio: "16:9",
    },
  },
});

require("fs").writeFileSync("./output.mp4", result.video.data);

Avatar (MuseTalk)

const portrait = readFileSync("./portrait.jpg");
const audio = readFileSync("./narration.mp3");

const result = await ai.generate({
  output: {
    mode: "avatar",
    avatar: {
      provider: "replicate", // or "musetalk" alias
      image: portrait,
      audio,
    },
  },
});

require("fs").writeFileSync("./avatar.mp4", result.avatar.buffer);

Music Generation (MusicGen)

const result = await ai.generate({
  output: {
    mode: "music",
    music: {
      provider: "replicate", // or "musicgen" alias
      prompt: "Lo-fi hip-hop beat with vinyl crackle",
      duration: 8,
      tempo: 80,
    },
  },
});

require("fs").writeFileSync("./track.mp3", result.music.buffer);

CLI Usage

# LLM
pnpm run cli generate "Hello" --provider replicate

# Image gen
pnpm run cli generate "A red panda" --provider replicate \
  --model black-forest-labs/flux-1.1-pro --imageOutput ./panda.png

# Video gen
pnpm run cli generate "smooth pan" --image ./input.jpg \
  --outputMode video --videoProvider replicate \
  --videoOutput ./out.mp4

# Avatar
pnpm run cli generate --outputMode avatar \
  --avatarProvider replicate \
  --avatarImage ./portrait.jpg \
  --avatarAudio ./narration.mp3 \
  --avatarOutput ./avatar.mp4

# Music
pnpm run cli generate "Lo-fi beat" \
  --outputMode music --musicProvider replicate \
  --musicTempo 80 --musicDuration 8 --musicOutput ./track.mp3

Configuration Reference

Environment VariableRequiredDefaultDescription
REPLICATE_API_TOKENYesReplicate API token (r8_...)
REPLICATE_MODELNometa/meta-llama-3.1-70b-instructDefault LLM model
REPLICATE_BASE_URLNohttps://api.replicate.comBase URL

Feature Support Matrix

FeatureLLMImageVideoAvatarMusic
StreamingSynthetic (single chunk)N/AN/AN/AN/A
Tool callingNoN/AN/AN/AN/A
Structured outputLimitedN/AN/AN/AN/A
Vision inputModel-dependentYes (img2img)Yes (start frame)YesNo

Cost Notes

Replicate bills by compute seconds, not by tokens. NeuroLink reports a symbolic per-token rate so cost-attribution dashboards have non-zero values, but the authoritative billing is from Replicate's own pricing dashboard.


Troubleshooting

"Invalid Replicate API token"

echo $REPLICATE_API_TOKEN
export REPLICATE_API_TOKEN=r8_...

Get / rotate at https://replicate.com/account/api-tokens.

"Replicate model 'X' not found"

Use the owner/name or owner/name:version format. Browse the catalog at https://replicate.com/explore.

Cold-start delays

First-call latency on rare models can spike (the inference container needs to warm). Subsequent calls reuse the warm container. NeuroLink caps polling at 5 minutes by default — bump REPLICATE_BASE_URL and Prefer: wait=60 configuration in the lifecycle helper if you regularly hit this.

Streaming feels chunky

The current implementation runs the prediction synchronously and emits a single chunk. True SSE streaming is planned — for now use OpenAI / xAI / Groq for low-latency token streaming.

Output is a URL, not base64

NeuroLink downloads the URL and converts to base64 to keep the imageOutput contract uniform. If you see a raw URL in the result, the download failed — check network access and Replicate's CDN status.


See Also


Need Help? Open a GitHub Discussion or issue.