scope-overworld

April 19, 2026 · View on GitHub

Scope plugin providing pipelines for Overworld world models.

Features

Waypoint 1.5 — Generate worlds at 720p / up to 60 fps using the Waypoint-1.5-1B model (Apache-2.0).
Waypoint 1.5 (360p) — Lighter-weight variant for laptop-class NVIDIA GPUs via Waypoint-1.5-1B-360P.

Platform note — NVIDIA only. This plugin runs on CUDA-capable NVIDIA GPUs (Linux and Windows). The underlying world_engine inference library has no Metal / MPS support today, so macOS / Apple Silicon is not supported via this Scope plugin. Mac users who want to try Waypoint-1.5 should use Overworld's native Biome desktop app, which has its own Mac build independent of this plugin.

Hardware guidance

Variant	Target hardware	Approx. FPS
Waypoint 1.5 (720p)	RTX 5090	56 fps unquantized, 72 fps with `fp8w8a8`
Waypoint 1.5 (720p)	RTX 3090	~30 fps with `intw8a8`
Waypoint 1.5 (360p)	Laptop-class NVIDIA GPUs (RTX 30xx mobile and up)	Real-time up to 60 fps

Quantization options exposed in the UI (load-time setting):

intw8a8 — INT8 weights/activations, requires NVIDIA Ampere+ (30xx)
fp8w8a8 — FP8, requires Ada Lovelace / Hopper+
nvfp4 — NVFP4, requires Blackwell and the flashinfer kernel library. flashinfer ships Linux-only wheels, so this tier is only available on Linux today; Windows users should pick intw8a8 or fp8w8a8.

HuggingFace

Model weights may require HuggingFace authentication. See the HuggingFace guide for setup instructions.

Install

Follow the Scope plugins guide to install this plugin using the URL:

https://github.com/daydreamlive/scope-overworld.git

Upgrade

Follow the Scope plugins guide to upgrade this plugin to the latest version.

Both waypoint and waypoint_360p pipelines use world_engine for inference. The model is an autoregressive Diffusion Transformer with a bundled Tiny Hunyuan Autoencoder (taehv1_5) providing 4× temporal and 8× spatial compression. On first load, a JIT warmup pass runs to trigger compilation.

Waypoint-1.5 is controller-driven (keyboard + mouse) with optional starter-image conditioning; it has no text-prompt input. Each inference step: controller input is processed → world_engine generates the next 4-frame chunk at the target resolution → Scope's pipeline processor splits the chunk into per-frame packets for the output stream.

Context window: 512 frames (~10 seconds at 60 fps).

Features

Hardware guidance

HuggingFace

Install

Upgrade

Architecture