skaters
June 23, 2026 · View on GitHub
One univariate time-series model to rule them all? — Nah I think we need two, and they are both here. Their names are Laplace and Doob and you can watch them here. Only use the latter if your martingality prior is very strong.
Fast, dependency-free, online univariate distributional forecasting in Python and JavaScript (identical to 1e-6, browser-ready via Pyodide). Across thousands of economic series Laplace beats the heavier, slower competition — AutoARIMA, AutoETS, GARCH-t, conformal, even zero-shot foundation models — on held-out log-likelihood, and ties on CRPS, conformal's home turf and a goal-post that won't grow your wealth. (Why likelihood is the metric that matters.)
Install
pip install skaters
Quick start
from skaters import laplace
f = laplace(k=3)
state = None
for y in observations:
dists, state = f(y, state)
dists[0].mean # point forecast
dists[0].std # uncertainty
dists[0].quantile(0.975) # 95th percentile
dists[0].logpdf(y) # log-likelihood
dists[0].cdf(y) # CDF at y
Every skater returns list[Dist] — a weighted Gaussian mixture for each horizon . Point forecasts, uncertainty, density evaluation, and quantiles are all aspects of the same object.
The named forecasters
skaters exposes a general forecaster plus two committed specialists —
everything else is a building block (transforms, leaves, ensembles) you can
compose. ("skater" is the concept — any (y, state) -> ([Dist], state)
function, borrowed from the old timemachines package)
from skaters import laplace, doob, mean_revert
f = laplace(k=1) # general purpose — the default
f = doob(k=1) # committed martingale + volatility clock (feed levels)
f = mean_revert(k=10) # committed mean reversion — multi-step (spreads, vol, rates)
doob and mean_revert are opposite priors: doob commits to a martingale (no
reversion), mean_revert to reversion toward a running mean. Pick the one whose
prior matches your series; laplace if you're unsure.
laplace — the general forecaster
A likelihood-weighted Bayesian ensemble over a large candidate population (EMA,
differencing, drift, Holt, AR, fractional differencing, seasonal, a Yeo-Johnson
coordinate grid, a fast/slow two-systems block, and — at multi-step horizons
(k>1) — an Ornstein–Uhlenbeck mean-reversion group; see mean_revert). The
one-step pool is unchanged. Three things are on by default, each a free or
near-free win:
- model first, conform last — the trunk is weighted by likelihood (honest
modelling); the terminal leaf is fit by CRPS (
objective="crps"). On a 2,500-series FRED study this matches a CRPS specialist on CRPS and lifts likelihood on real data. Switch back withobjective="likelihood". - lattice projection (
sticky=True) — near-Dirac atoms on the exact values a series revisits. Free on continuous data (it vanishes), a large win on grid/repeating series (policy rates, posted prices). - coordinate learning — a Yeo-Johnson λ-grid lets the ensemble learn whether the series is simple in a log/multiplicative, sqrt, or linear coordinate.
f = laplace(k=1) # CRPS leaf + lattice, both on
f = laplace(k=1, objective="likelihood") # pure-likelihood leaf
f = laplace(k=1, sticky=False) # no lattice projection
Price/return series (garch_leaf). The default terminal leaf tracks its scale
with an EWMA (RiskMetrics/IGARCH — no variance mean-reversion). For series with
volatility clustering and reversion (equity/fx/commodity returns), swap in a
GARCH(1,1)-t terminal leaf:
from skaters import laplace, garch_leaf
f = laplace(k=1, leaf=garch_leaf) # GARCH(1,1) conditional variance + Student-t tails
On the price population it recovers ~60% of the held-out log-likelihood gap to a
fitted GARCH-t (a finer (α,β) grid with a free ω) and is neutral-to-positive
elsewhere (see benchmarks/garch_leaf_threeway.py).
doob — the martingale specialist
A committed, driftless martingale with a learned volatility clock: a Bayesian average over martingale predictives that differ only in their volatility model (constant, GARCH, slowly-varying, heavy-tailed). Because every candidate shares the same mean, plain averaging blends the clocks without washing out kurtosis. By Dambis–Dubins–Schwarz a continuous martingale is a time-changed Brownian motion — "BM on a stochastic clock".
Feed it the level series (prices, indices, rates), not pre-differenced
changes. When the martingale prior holds it edges laplace by committing the
mean and spending its capacity on the clock; on genuinely mean-reverting series
(e.g. the VIX) the prior is wrong and it gives ground — a deliberately sharp tool.
mean_revert — the mean-reversion specialist
The counterpart to doob: where doob commits to a martingale, mean_revert
commits to Ornstein–Uhlenbeck reversion toward a running mean and learns only
how fast, averaging candidates over a grid of reversion speeds (online,
likelihood-weighted). It's built for exactly the regime where doob gives ground.
f = mean_revert(k=10) # signed spreads (pairs trading) — linear space
f = mean_revert(k=10, coordinate=0.5) # positive series (vol, rates) — sqrt / CIR space
f = mean_revert(k=10, coordinate=0.0) # positive series — log / geometric reversion
The reversion edge over a random walk grows with the horizon ($1-\phi^h$), so
this is primarily a multi-step tool — feed it k>1. (At one step, mean
reversion is nearly indistinguishable from predicting the running level, which the
general pool already does.) The mechanism is ou_transform, a reusable
mean-reversion transform; the math is the OU-on-a-coordinate reading of CIR — see
papers/tweedie-note.md and the validation in
benchmarks/cir_ablation.py.
Architecture
Everything is transforms all the way down, with a distributional leaf at the bottom:
The leaf estimates from residuals via Welford's algorithm. The prediction in the original space is obtained by inverting the transform chain:
Every node returns list[Dist]. There is no separate "point forecast" vs "uncertainty" — both are aspects of the same .
The key insight
Every "model" is really a transform. An EMA doesn't "predict" — it subtracts a running level , leaving simpler residuals . The prediction comes from inverting the transform chain applied to the leaf's distributional estimate.
The Dist type
A weighted mixture of Gaussians . Pure Python (math.erf, math.exp).
from skaters import Dist
d = Dist.gaussian(5.0, 2.0)
d.mean # 5.0
d.std # 2.0
d.pdf(5.0) # density at x
d.cdf(3.0) # P(X <= 3)
d.logpdf(5.0) # log-likelihood
d.quantile(0.975) # inverse CDF
# Exact mixture combination (for ensembles)
mix = Dist.combine([d1, d2, d3], weights=[0.5, 0.3, 0.2])
# Propagate through transform inverses
d.shift(10.0) # translate: mu -> mu + 10
d.scale(2.0) # scale: mu -> 2*mu, sigma -> 2*sigma
d.affine(2.0, 3.0) # x -> 2x + 3
# Bound component growth
d.prune(max_components=10)
Transforms
Online bijective maps. Each has a forward (scalar in, scalar out) and an inverse_k that propagates objects back through the inverse.
| Transform | Forward | Inverse | Use case |
|---|---|---|---|
ema_transform() | Remove level | ||
difference() | Cumsum with growing as | Random walk | |
drift() | Random walk + drift | ||
holt_linear() | Level + trend (Holt 1957) | ||
ar() | AR reconstruction with variance propagation | Autoregression (online RLS) | |
grouped_ar() | Same, grouped coefficients | Same | Long-lag AR with params |
fractional_difference() | Long memory | ||
standardize() | Remove scale | ||
garch() | Volatility clustering | ||
seasonal_difference() | Shift by lagged value | Periodicity | |
power_transform() | Delta method | Tail compression |
Conjugation
Transforms compose via conjugation. Given a transform and a skater :
The pipe | notation reads left-to-right (outermost transform first):
from skaters import conjugate, ema, difference, standardize
# diff removes trend, EMA predicts the differenced series
f = conjugate(ema(alpha=0.1, k=3), difference(), k=3)
# Chain: standardize, then difference, then EMA
f = conjugate(
conjugate(ema(alpha=0.1, k=3), difference(), k=3),
standardize(),
k=3,
)
# canonical name: std|diff|ema_t|leaf
Ensembles
Precision-weighted (MSE)
Weights by where .
from skaters import precision_weighted_ensemble, ema
f = precision_weighted_ensemble([
ema(alpha=0.05, k=1),
ema(alpha=0.2, k=1),
], k=1)
Bayesian (log-likelihood, XGBoost-inspired regularization)
Each model accumulates a log-weight updated at every observation:
where is the learning rate (shrinkage), is the complexity penalty, and is the model's depth. Predictions are combined via with softmax weights.
from skaters import bayesian_ensemble, ema
f = bayesian_ensemble(
[ema(alpha=0.05, k=1), ema(alpha=0.2, k=1)],
k=1,
learning_rate=0.5, # eta: prevents over-concentrating
complexity_penalty=0.02, # lambda: penalizes deeper chains
depths=[1, 1],
)
Adaptive search (beam search over transform grammar)
Grows the candidate population online: expand top performers with new transforms, replay recent history to warm-start, prune losers.
from skaters import search
f = search(
k=1,
expand_interval=100, # expand top performers every 100 obs
max_depth=3, # maximum transform chain depth
replay_buffer=500, # warm-start new candidates on recent history
max_pool=30, # cap active candidates
)
Heavy tails: the scale-mixture leaf
Everything here is judged by predictive log-likelihood. A plain Gaussian leaf gets the location and scale right but the shape wrong on heavy-tailed residuals (returns, macro data), and — crucially — Bayesian model averaging preserves the mean and variance but washes the kurtosis out, so adding heavy leaves to the candidate pool doesn't help.
The fix is the scale-mixture leaf: a fixed dictionary of zero-mean Gaussians
N(0, aᵢ·σ) with weights learned online (a Student-t is a Gaussian scale
mixture, so this approximates it). It's a plain Dist; the weights are the
"discrepancy from N(0,1)" — all on a=1 is Gaussian, mass on larger a is fat
tails. It matches the Gaussian leaf on Gaussian data and beats it as tails fatten.
from skaters import scale_mixture_leaf, terminal_leaf_ensemble, leaf
Because mixing washes out shape, the named policies use a terminal-leaf
ensemble: the candidates are combined for the mean, then one terminal
scale-mixture leaf models the combined residual — so the leaf's shape reaches the
output undiluted. On Student-t₃ this takes laplace from a logpdf of ≈ −2.07
(Gaussian-collapsed) to ≈ −1.93, with no cost on Gaussian data.
Dist.crps(y) (closed-form CRPS) is also available as a proper score for
benchmarking.
Spec system
Serialize and rebuild any pipeline:
from skaters import (
build, spec_name, to_json, from_json,
ema_spec, conjugate_spec, ensemble_spec, diff_spec,
)
spec = ensemble_spec(
conjugate_spec(ema_spec(0.1, k=1), diff_spec()),
ema_spec(0.3, k=1),
k=1,
)
spec_name(spec) # "ensemble(diff|ema(0.1),ema(0.3))"
j = to_json(spec) # JSON string
f = build(from_json(j)) # live skater
Writing a custom transform
Any pair where forward is scalar and inverse_k maps list[Dist]:
def my_transform():
def forward(y, state):
if state is None:
return 0.0, {"anchor": y}
transformed = y - state["anchor"]
return transformed, {"anchor": y}
def inverse_k(dists, state):
return [d.shift(state["anchor"]) for d in dists]
return forward, inverse_k
JavaScript & the browser
The whole library is also a zero-dependency JavaScript port (docs/js/skaters/) — every
transform, ensemble, and named policy. It is verified against the Python reference by a parity
suite that checks 76,000+ values to 1e-6 (parity/, run in the test suite via
tests/test_js_parity.py).
<script type="module">
import { laplace } from "https://skaters.microprediction.org/js/skaters/index.mjs";
const f = laplace(1);
let state = null;
for (const y of observations) {
const [dists, st] = f(y, state); state = st;
dists[0].mean; // point forecast
dists[0].quantile(0.975); // 97.5th percentile
}
</script>
Interactive demos (forecasting playground in native JS, and the real Python package running in Pyodide) live at skaters.microprediction.org/demos.
Design
- Online only — per observation, no batch recomputation
- Distributional — every prediction is a , not a point estimate
- Composable — transforms chain, ensembles nest, everything returns
- Pure Python — zero dependencies, only
math.erfandmath.exp - Pyodide compatible — works in the browser via WebAssembly
Theoretical context
The online recursions here are score-driven updates with a Bayesian reading.
The EMA level update and the GARCH
variance update — the heart of
doob's volatility clock — are both inverse-Fisher-scaled conditional-score
corrections. Via Tweedie's formula, Hansen & Tong (2026,
arXiv:2605.15902) show these are the exact
Bayesian posterior-mean corrections under a conjugate prior with local precision
discounting (with the smoothing factor identified as , the
Gaussian-location case recovering the Kalman filter), and tractable local
approximations otherwise. So doob is averaging a family of (approximate)
Bayesian volatility filters rather than ad-hoc heuristics. See also Creal,
Koopman & Lucas (2013) and Harvey (2013) for the score-driven / GAS framework.
The same identity is the backbone of modern denoising / score-based diffusion
models: the posterior mean of a clean signal given a noisy observation is
"observation score of the marginal density," which is what
lets a diffusion denoiser be read as a score estimator (Efron 2011; Vincent 2011;
Song & Ermon 2019). Each forecast step here is the time-series analogue —
denoising the next observation toward the latent level or variance. A short essay
on this — Kalman, empirical Bayes, and diffusion as one identity — is in
papers/tweedie-note.md.
Lineage
This package distills ideas from timemachines, which provided a common skater interface for dozens of time series packages. This is a from-scratch rewrite focused on speed, distributional predictions, and browser compatibility.