wxtrain

March 22, 2026 · View on GitHub

A Hermes Agent plugin for building ML-ready weather training datasets. Powered by wxtrain, an all-Rust end-to-end pipeline.

What It Does

Ask Hermes to build training datasets from operational weather models:

  • "Plan a severe weather dataset for a Swin transformer" → 25-channel training spec with export format, loss function, and model recipe
  • "Fetch HRRR CAPE data" → downloads via byte-range .idx subsetting (~500KB instead of 125MB)
  • "Build training arrays from this GRIB file" → NPY arrays + preview PNGs + manifests
  • "What's the theta-e at 30C, 20C dewpoint, 850mb?" → instant thermodynamic calculation

8 Tools

ToolDescription
wxt_modelsList supported weather models and sources
wxt_fetchDownload GRIB fields via byte-range .idx
wxt_scanList all messages in a GRIB file
wxt_decodeDecode a GRIB message — stats, grid dimensions, variable info
wxt_calcThermodynamic calculations (theta, theta_e, RH)
wxt_renderRender a GRIB field as PNG
wxt_planPlan a training dataset for an ML architecture
wxt_buildBuild training arrays from GRIB files

Pipeline

wxt_plan (architecture + task → channel spec + export format)

wxt_fetch (NOAA/ECMWF → byte-range GRIB download)

wxt_build (decode → compute derived fields → export NPY/Parquet/WebDataset)

Architecture-Aware Planning

wxt_plan knows how to prepare data for different ML architectures:

ArchitectureChannelsFormatLoss
Swin Transformer25 (surface + pressure + severe)WebDataset (96 shards)smooth_l1
Diffusion13 (surface + pressure)WebDatasetnoise_prediction_mse
Classical ML22 (surface + severe + tabular)ParquetMSE/BCE
Graph Network13 (surface + pressure)WebDatasetsmooth_l1

Feature Profiles

ProfileFields
surface_coret2m, d2m, u10, v10, mslp
pressure_corez500, t850, u850, v850, vort500, div500, theta850, tadv850
severe_diagnosticssbcape, sbcin, mlcape, mlcin, mucape, mucin, srh01, srh03, shear06, stp, scp, pwat
radar_corereflectivity, velocity, spectrum_width
thermodynamic_profilestheta_e, wet_bulb, lcl_height, lfc_height, dcape
tabular_statschannel_min/mean/max, valid_hour_sin/cos

Supported Models

ModelSourceResolutionAuth
HRRRNOAA3km CONUSNone
GFSNOAA0.25° globalNone
NAMNOAA12km CONUSNone
RAPNOAA13km CONUSNone
ECMWF IFSOpen Data0.25° globalNone
ERA5CDS API0.25° reanalysisCDS key

Stack

100% Rust core. No Python, no C, no eccodes, no Fortran.

wxtrain binary (22,488 lines of Rust)
├── wx-fetch   — download planning, byte-range fetch, CDS auth
├── wx-grib    — native GRIB1/2 decode (JPEG2000, CCSDS/AEC)
├── wx-calc    — 100+ met calculations (MetPy parity verified)
├── wx-radar   — NEXRAD ingest
├── wx-render  — PNG rendering
├── wx-train   — dataset planning & assembly
├── wx-export  — NPY, Parquet, WebDataset, Zarr
└── wx-types   — shared domain model

Setup

# 1. Build wxtrain
git clone https://github.com/FahrenheitResearch/wxtrain
cd wxtrain && cargo build --release

# 2. Copy plugin to Hermes
cp -r wxtrain-plugin ~/.hermes/plugins/wxtrain

# 3. (Optional) set binary path
export WXTRAIN_PATH=/path/to/wxtrain/target/release/wxtrain

Output Formats

FormatUse Case
NPYQuick prototyping, single arrays
ParquetTabular ML (XGBoost, LightGBM)
WebDatasetDistributed training (PyTorch)
ZarrCloud-native, chunked arrays

Companion Plugin

This plugin pairs with the Hermes Weather Plugin — use the weather plugin to explore and visualize data, then use wxtrain to build training datasets from the same models.

Credits

  • wxtrain engine: Built with Codex
  • Meteorological calculations: Verified against MetPy test suites
  • Plugin platform: Hermes Agent by Nous Research