handterm
April 20, 2026 · View on GitHub
handterm
A Wayland-native terminal emulator focused on reaching the theoretical limits of performance and resource efficiency.
Rust workspace split across 6 packages. Single-process host architecture for low-overhead multi-window scaling.

Why handterm
Every terminal emulator is "fast enough." Handterm asks a different question: how close to the hardware limits can a terminal get?
Each layer of the pipeline is independently benchmarked against its theoretical floor. The parser is measured against memcpy. Cell writes are timed in nanoseconds. Startup is measured in microseconds. The goal is not to be marginally faster, but to understand where the ceilings are and sit as close to them as the hardware allows.
Handterm has both a CPU renderer (softbuffer) and a GPU renderer (wgpu). The current best low-memory path is now a single-process host architecture: one long-lived process owns many windows and shares the heavy renderer/runtime state across them. The long-term target remains <1 MB RSS per additional window, and the current shared-GPU host is already close to that regime in practice.
Benchmarks
All measurements below refer to the current host-based handterm architecture, not the older one-process-per-window standalone path.
Test system: Intel Core Ultra 7 256V, 15 GB RAM, Arch Linux, niri Wayland compositor, 2560x1600 @ 120 Hz.
What the numbers mean now
Handterm currently has three relevant local architectures:
- CPU host — one process, many CPU-rendered windows
- GPU host — one process, many GPU-rendered windows
- daemon mode — server plus separate thin clients (still implemented, but soft-deprecated for normal local use and no longer the best local low-RAM path)
Current measured memory
Host memory scaling
| Setup | First window | Each additional |
|---|---|---|
| handterm CPU host | ~37 MB | ~20 MB |
| handterm GPU host | ~61 MB | ~1-2 MB |
| handterm daemon server-only | ~4.4 MB | N/A |
The current best local low-RAM path is the shared-GPU host. It pays a large fixed first-window cost once, then scales cheaply for additional windows.
Measured shared-GPU host scaling on this machine/session:
| Windows | handterm GPU host RSS |
|---|---|
| 1 | ~61.3 MB |
| 2 | ~63.1 MB |
| 3 | ~64.0 MB |
| 4 | ~65.0 MB |
| 5 | ~67.2 MB |
| 6 | ~68.1 MB |
That puts the incremental cost in roughly the 1-2 MB/window range after the first window.
Why CPU host is still larger per added window
The CPU host path improved substantially, but the dominant remaining per-window cost is the Wayland/softbuffer SHM presentation buffers. In other words, the CPU host is no longer mainly limited by terminal state — it is limited by presentation-buffer memory.
Current measured startup behavior
CPU host
- added-window startup from an existing host: ~16 ms
GPU host
The shared-GPU host is the best path for memory, but new-window startup is still the main remaining bottleneck. Recent measured open-window timings on the shared-GPU host are in the tens of milliseconds, with the first extra window sometimes noticeably slower than later ones.
Internal timing instrumentation shows the remaining cost is mostly in:
- window / surface creation
not PTY spawn.
Binary and install size
| Terminal | Binary | Install total | Language |
|---|---|---|---|
| foot | 477 KB | ~1 MB | C |
| handterm | 3.4 MB | 3.4 MB | Rust |
| alacritty | 8.9 MB | ~9 MB | Rust |
| kitty | 88 KB* | ~18 MB | C + Python |
| ghostty | 26 MB | ~29 MB | Zig |
*kitty's binary is a Python launcher; the real code lives in /usr/lib/kitty/ (18 MB of .so files and Python).
Thread count
| Terminal | Threads |
|---|---|
| handterm | 2 |
| kitty | 7 |
| foot | 9 |
| alacritty | 10 |
| ghostty | 25 |
Shared library dependencies
Number of unique .so files mapped into the process.
| Terminal | Shared libs |
|---|---|
| foot | 22 |
| handterm | 24 |
| alacritty | 52 |
| kitty | 85 |
| ghostty | 163 |
Virtual memory (VSZ)
Total virtual address space mapped (not all resident).
| Terminal | VSZ |
|---|---|
| handterm | 119 MB |
| kitty | 463 MB |
| alacritty | 726 MB |
| foot | 1,529 MB |
| ghostty | 2,000 MB |
foot's high VSZ is from mmap'd font files and Wayland protocol buffers; most is not resident. GPU terminals reserve large virtual ranges for driver allocations.
Multi-window efficiency
| Setup | First window | Each additional |
|---|---|---|
| foot standalone | 24 MB | +24 MB |
| foot --server + footclient | 25 MB (server) + 1.6 MB | +1.6 MB |
| handterm CPU host | ~37 MB | ~20 MB |
| handterm GPU host | ~61 MB | ~1-2 MB |
| handterm daemon mode | ~4.4 MB server-only | current clients still too heavy |
Handterm still has a daemon/thin-client implementation, but it is now soft-deprecated for normal local use. After profiling both approaches, the most promising low-RAM local architecture is the shared-GPU single-process host. In current live measurements, the first GPU host window pays the full GPU/runtime cost once, and each additional window adds only about 1-2 MB RSS.
Measured shared-GPU host scaling on this machine/session:
| Windows | handterm GPU host RSS |
|---|---|
| 1 | ~61.3 MB |
| 2 | ~63.1 MB |
| 3 | ~64.0 MB |
| 4 | ~65.0 MB |
| 5 | ~67.2 MB |
| 6 | ~68.1 MB |
That puts the incremental cost in roughly the 1-2 MB/window range after the first window.
Feature comparison
| Feature | handterm | foot | alacritty | kitty | ghostty |
|---|---|---|---|---|---|
| GPU rendering | ✅ | - | ✅ | ✅ | ✅ |
| CPU rendering | ✅ | ✅ | - | - | - |
| True color (24-bit) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Ligatures | ✅ | ✅ | - | ✅ | ✅ |
| Sixel graphics | - | ✅ | - | - | ✅ |
| Kitty image protocol | partial* | - | - | ✅ | ✅ |
| Daemon mode | ✅ | ✅ | - | - | - |
| Single-process multi-window host | ✅ | - | - | ✅ | ✅ |
| Tabs | - | - | - | ✅ | ✅ |
| Splits/panes | - | - | - | ✅ | ✅ |
| Bracketed paste | ✅ | ✅ | ✅ | ✅ | ✅ |
| Mouse reporting | ✅ | ✅ | ✅ | ✅ | ✅ |
| OSC 52 clipboard | ✅ | ✅ | ✅ | ✅ | ✅ |
| Kitty keyboard protocol | partial* | ✅ | - | ✅ | ✅ |
| IPC / remote control | ✅ | - | - | ✅ | - |
| X11 support | - | - | ✅ | ✅ | ✅ |
| macOS support | - | - | ✅ | ✅ | ✅ |
| Font shaping engine | rustybuzz | harfbuzz | built-in | harfbuzz | harfbuzz |
| Config format | TOML | INI | TOML | conf | custom |
| Scrollback (default) | 10,000 | 10,000 | 10,000 | 2,000 | 10,000 |
* Remaining known Kitty image gaps: handterm now supports the core raw RGB/RGBA/PNG upload-place-delete path, including inline o=z compressed payloads, but not the full Kitty graphics protocol surface such as non-inline transports and richer placement/operation parameters.
* Remaining known Kitty keyboard gaps are limited to keys not currently exposed through winit, such as MEDIA_REVERSE, ISO_LEVEL5_SHIFT, and right-side Meta/Hyper distinction.
Pipeline throughput
From handterm bench. Internal processing speed, not rendering.
| Stage | ASCII | SGR color | Mixed |
|---|---|---|---|
| Theoretical floor (memcpy) | 5,944 MB/s | - | - |
| Theoretical floor (byte scan) | 2,088 MB/s | - | - |
| Parser (state machine) | 279 MB/s | 362 MB/s | 339 MB/s |
| Grid write (parser + cells) | 363 MB/s | 328 MB/s | 259 MB/s |
| Full pipeline | 330 MB/s | 174 MB/s | 209 MB/s |
At 120x72 (HiDPI fullscreen), the pipeline can repaint the entire screen 2,507 times per second. A 120 Hz display needs 1.
Codebase size
Current workspace snapshot from this repository:
| Terminal | Lines of code | Language | Packaging / dependencies |
|---|---|---|---|
| handterm | ~24,200 | Rust | 6 local workspace crates, 341 resolved Cargo packages |
| alacritty | ~34,000 | Rust | ~100+ crates |
| foot | ~55,000 | C | system libs only |
| kitty | ~116,000 | C + Python | system libs + Python stdlib |
| ghostty | ~230,000 | Zig | vendored deps |
How to reproduce
# Startup time (requires niri compositor)
before=$(niri msg windows | grep -c "Window ID")
start=$(date +%s%N)
<terminal> &
pid=$!
while [ $(niri msg windows | grep -c "Window ID") -eq $before ]; do sleep 0.002; done
end=$(date +%s%N)
echo "$(( (end - start) / 1000000 )) ms"
kill $pid
# Synthetic frontend input dedupe (GPU by default; uses socket + isolated runtime dir)
./scripts/test_input_dedupe.sh
# Memory
<terminal> &
pid=$!
sleep 1
grep VmRSS /proc/$pid/status
# Pipeline throughput
handterm bench
Features
Terminal emulation
- VT100/VT220 parser: CSI, SGR, OSC, DCS, ESC sequences
- True color (24-bit RGB), 256 color palette, bold, dim, italic, inverse, strikethrough
- Underline styles: single, double, curly, dotted, dashed (with custom colors)
- DECAWM auto-wrap with pending wrap semantics
- Scroll regions, insert/delete lines and characters
- Alt screen, cursor save/restore, cursor styles (block, bar, underline)
- DEC special graphics (line drawing characters)
- Device attributes (DA1/DA2), device status reports
Rendering
- GPU renderer via wgpu with instanced cell rendering and WGSL shaders (default when built)
- CPU renderer via softbuffer (
--backend cpu, but Wayland presentation remains opaque) - Two-pass rendering (backgrounds then glyphs) for correct powerline/nerd font display
- Damage tracking with bitset dirty map
- Ligature support via rustybuzz text shaping
- DPI-aware rendering (HiDPI)
Input and interaction
- Full keyboard input with Ctrl, Shift, function keys
- Mouse reporting: X10, Normal, Button, Any-event, SGR encoding
- Bracketed paste mode
- Focus events
- Text selection with mouse drag, copy-on-select via wl-copy
- 10,000 line scrollback with ring buffer
Unicode
- On-demand FreeType glyph rasterization
- Wide character support (CJK, emoji)
- Fontconfig font discovery with caching
Shell integration
- Kitty keyboard protocol set/query/push/pop + expanded CSI-u key encoding, with remaining known gaps limited to keys not exposed through current
winitinput APIs - XTVERSION response
- OSC 10/11 color queries
- OSC 52 clipboard
- OSC 0/2 window title
IPC / host control
- Unix socket remote control (
handterm @ <command>) - Host window creation via
handterm open-window - Core commands: get-text, send-text, send-key, send-key-event, send-ime-commit, get-cursor, get-size, set-title, close, ls
- Host-specific commands: open-window, focus-window, list-windows
- Synthetic
send-key-eventaccepts an optionalphysical_keyfield for keypad keys, MENU/context-menu, and side-specific modifier variants thatwinitexposes
Examples:
# Context menu / MENU key via host control
handterm @ send-key-event '{"key":"menu","physical_key":"context_menu"}'
# Keypad center (KP_BEGIN / numpad 5 in navigation mode)
handterm @ send-key-event '{"key":"clear","physical_key":"numpad5"}'
# Right shift press
handterm @ send-key-event '{"key":"shift","physical_key":"shift_right","kind":"press","shift":true}'
Install
Requires Wayland, FreeType, and Fontconfig.
# From source
cargo install --path .
# Or build directly
cargo build --release
./target/release/handterm
This default build includes both CPU and GPU frontends. Plain local handterm now follows the host path by default: when a compatible host is already running, repeated launches reuse that host and open another window in the same process instead of spawning a full new renderer process.
Build with GPU rendering only
cargo build --release --features gpu --no-default-features
Build with CPU rendering only
cargo build --release --features cpu --no-default-features
Configuration
Config file: ~/.config/handterm/config.toml
Generate defaults:
handterm init-config
Example:
[style]
font_family = "JetBrainsMono Nerd Font Light"
font_size = 11.0
background = "#000000"
foreground = "#cdd6f4"
cursor = "#f5e0dc"
background_opacity = 0.9
[window]
columns = 80
rows = 24
[scrollback]
lines = 10000
smooth = false
smooth_speed = 3.0
scrollbar = true
[performance]
repaint_delay_ms = 5
sync_to_monitor = true
background_opacity is implemented by the GPU backend. If you force --backend cpu, Handterm will stay opaque on Wayland because softbuffer presents Xrgb8888 there.
scrollback.smooth = true enables experimental GPU-side fractional scrollback rendering. It keeps the normal terminal/grid model, but the GPU renderer draws one extra row and applies a pixel Y offset so touchpad/pixel wheel input can reveal partial lines. Smooth mode now also adds inertial carry, so quick wheel/trackpad gestures continue gliding instead of stopping immediately.
scrollback.smooth_speed controls how far each smooth-scroll gesture travels. The default is 3.0, which is intentionally more aggressive than the initial 1:1 prototype and pairs with the momentum model to carry farther on flicks.
scrollback.scrollbar controls a thin right-edge overlay scrollbar. It is enabled by default and does not consume a terminal column.
This currently applies to the standalone/shared-GPU host path. CPU rendering still uses whole-row scrollback presentation, and remote thin clients do not yet have a protocol for server-owned smooth scrollback surfaces.
Architecture
Single-process host
|
+-- window #1
| PTY -> parser -> terminal -> grid -> renderer -> Wayland surface
|
+-- window #2
| PTY -> parser -> terminal -> grid -> renderer -> Wayland surface
|
+-- window #N
PTY -> parser -> terminal -> grid -> renderer -> Wayland surface
Shared across windows in the host:
- event loop
- IPC socket / control plane
- CPU or GPU renderer runtime foundation
- glyph/font resources per DPI
- on the GPU path: shared `wgpu` instance / adapter / device / queue / pipeline cache
Each layer is independently benchmarkable. The parser can be tested without a grid. The grid can be tested without a renderer. handterm bench measures every boundary.
Source layout
handterm-common/
src/
grid.rs Cell storage / scrollback
parser.rs VT parser
protocol.rs daemon protocol types
terminal.rs terminal state machine
src/
app.rs CPU single-process host runtime
gpu_app.rs GPU single-process host runtime
gpu_runtime.rs shared/per-window GPU runtime pieces
render.rs CPU renderer
frontend.rs shared scheduling/input helpers
pty.rs PTY spawn and I/O
ipc.rs host IPC server
daemon.rs daemon/server runtime
remote_app.rs CPU thin client
remote_gpu_app.rs GPU thin client
Development
cargo check --workspace
cargo test --workspace
cargo run -- bench
cargo run -- print-config
Roadmap
See OPTIMIZATION.md for the full performance roadmap.
Architecture status: the default and recommended path is the single-process host architecture. Daemon/thin-client mode remains available for compatibility and experimentation, but is soft-deprecated.
| Phase | Goal | Status |
|---|---|---|
| CPU rendering | Functional terminal with softbuffer | ✅ |
| GPU rendering | wgpu backend with instanced shaders | ✅ |
| Single-process CPU host | Shared-process multi-window CPU runtime | ✅ |
| Shared-GPU host | Low-overhead multi-window GPU runtime | ✅ |
| Server/client mode | Daemon architecture like foot --server | ✅ implemented, soft-deprecated |
| Workspace split | Thin client/server/common Cargo packages | ✅ foundation implemented |
| Startup/per-window polish | Push toward theoretical window-overhead floor | in progress |
Current best path: shared-GPU host at roughly ~1-2 MB per extra window after the first window. The remaining work is shaving startup cost and pushing the incremental slope down further.
License
MIT