scope-overworld
April 19, 2026 · View on GitHub
Scope plugin providing pipelines for Overworld world models.
Features
- Waypoint 1.5 — Generate worlds at 720p / up to 60 fps using the Waypoint-1.5-1B model (Apache-2.0).
- Waypoint 1.5 (360p) — Lighter-weight variant for laptop-class NVIDIA GPUs via Waypoint-1.5-1B-360P.
Platform note — NVIDIA only. This plugin runs on CUDA-capable NVIDIA GPUs (Linux and Windows). The underlying
world_engineinference library has no Metal / MPS support today, so macOS / Apple Silicon is not supported via this Scope plugin. Mac users who want to try Waypoint-1.5 should use Overworld's native Biome desktop app, which has its own Mac build independent of this plugin.
Hardware guidance
| Variant | Target hardware | Approx. FPS |
|---|---|---|
| Waypoint 1.5 (720p) | RTX 5090 | 56 fps unquantized, 72 fps with fp8w8a8 |
| Waypoint 1.5 (720p) | RTX 3090 | ~30 fps with intw8a8 |
| Waypoint 1.5 (360p) | Laptop-class NVIDIA GPUs (RTX 30xx mobile and up) | Real-time up to 60 fps |
Quantization options exposed in the UI (load-time setting):
intw8a8— INT8 weights/activations, requires NVIDIA Ampere+ (30xx)fp8w8a8— FP8, requires Ada Lovelace / Hopper+nvfp4— NVFP4, requires Blackwell and the flashinfer kernel library. flashinfer ships Linux-only wheels, so this tier is only available on Linux today; Windows users should pickintw8a8orfp8w8a8.
HuggingFace
Model weights may require HuggingFace authentication. See the HuggingFace guide for setup instructions.
Install
Follow the Scope plugins guide to install this plugin using the URL:
https://github.com/daydreamlive/scope-overworld.git
Upgrade
Follow the Scope plugins guide to upgrade this plugin to the latest version.
Architecture
Both waypoint and waypoint_360p pipelines use world_engine for inference. The model is an autoregressive Diffusion Transformer with a bundled Tiny Hunyuan Autoencoder (taehv1_5) providing 4× temporal and 8× spatial compression. On first load, a JIT warmup pass runs to trigger compilation.
Waypoint-1.5 is controller-driven (keyboard + mouse) with optional starter-image conditioning; it has no text-prompt input. Each inference step: controller input is processed → world_engine generates the next 4-frame chunk at the target resolution → Scope's pipeline processor splits the chunk into per-frame packets for the output stream.
Context window: 512 frames (~10 seconds at 60 fps).