stochastic-rs

June 6, 2026 Ā· View on GitHub

Build Workflow Crates.io License codecov

stochastic-rs

A high-performance Rust library for stochastic process simulation, quantitative finance, statistics, copulas, distributions, and neural-network volatility surrogates. Generic over f32 / f64, with SIMD acceleration on CPU and CUDA / Metal / Accelerate / cubecl backends where they pay off, and first-class Python bindings via PyO3.

Documentation

šŸ“– stochastic.rust-dd.com — full docs site (Fumadocs + Next.js, deployed on Vercel).

Highlights:

  • 120+ stochastic processes — diffusion, jump, fractional / rough, short-rate, HJM, LMM, fBM, Hawkes, LĆ©vy. Generic-precision ProcessExt<T> impl, SIMD on CPU, optional CUDA / Metal for FGN / fBM.
  • Pricing & calibration — closed-form (BSM, Bachelier, Black76, Bjerksund-Stensland, …), Fourier (Heston / Bates / Merton-jump / Kou / VG / CGMY / HKDE / double-Heston), Monte Carlo (basket, rainbow, cliquet, autocallable, spread), finite difference, Bermudan LSM, Heston SLV. Heston / SABR / SVJ / LĆ©vy / rough Bergomi / double-Heston / Hull-White swaption-grid calibrators.
  • Statistics & risk — Hurst (Fukasawa), MLE for 1-D diffusions with 6 transition densities, ADF / KPSS / Phillips-Perron, realised variance with BNHLS bandwidth, HMM, changepoint, particle filter, UKF. VaR / CVaR / drawdown, Sharpe / Sortino / IR / Calmar.
  • Fixed income & credit — yield-curve bootstrapping, Nelson-Siegel / Svensson, multi-curve, IRS / inflation swaps, Vasicek / CIR / Hull-White / G2++ short-rate engines, Merton structural model, reduced-form survival curves, CDS pricing, JLT migration matrices.
  • Microstructure — Almgren-Chriss, Kyle (1985), Bouchaud propagator, full price-time priority order book.
  • Distributions & copulas — 19 SIMD distributions with closed-form pdf / cdf / cf / moments. Clayton / Frank / Gumbel / Independence bivariate; Gaussian / vine multivariate.
  • Python bindings — 210 entries (198 PyO3 classes + 12 functions) spanning every sub-crate except AI surrogates. Numpy-in / numpy-out.

Installation

Rust

[dependencies]
stochastic-rs = "2.0.0"
use stochastic_rs::prelude::*;
use stochastic_rs::stochastic::diffusion::gbm::Gbm;
use stochastic_rs::quant::pricing::heston::HestonPricer;

For per-sub-crate (lean) builds, OpenBLAS / CUDA / Metal / cubecl / Accelerate feature flags, native CPU optimisation, and SIMD details, see the installation guide on the docs site.

Python

pip install stochastic-rs

Source build (requires the Rust toolchain):

pip install maturin
maturin develop --release --manifest-path stochastic-rs-py/Cargo.toml

Linux (x86_64 / aarch64) and macOS (arm64 / x86_64) wheels ship with the openblas feature on. The Windows wheel omits the 15 BLAS-backed classes; everything else (ā‰ˆ195 classes / 12 functions) works identically. See the Python bindings page for the parity table and the source-build path with vcpkg.

Quickstart

use stochastic_rs::prelude::*;
use stochastic_rs::simd_rng::Unseeded;
use stochastic_rs::stochastic::diffusion::ou::Ou;
use stochastic_rs::quant::pricing::heston::HestonPricer;

fn main() {
    // Mean-reverting Ornstein-Uhlenbeck path: Ou::new(theta, mu, sigma, n, x0, t, seed)
    let ou = Ou::<f64>::new(2.0, 0.0, 1.0, 1_000, Some(0.0), Some(1.0), Unseeded);
    let path = ou.sample();
    println!("OU path points: {}", path.len());

    // Heston (1993) European option, closed form. HestonPricer::new args:
    // s, v0, k, r, q, rho, kappa, theta, sigma, lambda, tau, eval, expiration
    let pricer = HestonPricer::new(
        100.0, 0.04, 100.0, 0.03, Some(0.0),
        -0.5, 2.0, 0.04, 0.3, Some(0.0),
        Some(1.0), None, None,
    );
    let (call, put) = pricer.calculate_call_put();
    println!("call={call:.4}, put={put:.4}");
}
import stochastic_rs as srs

# Mean-reverting OU path
p = srs.Ou(theta=2.0, mu=0.0, sigma=1.0, n=1000, x0=0.0, t=1.0)
path = p.sample()                       # numpy.ndarray, shape (1000,)

# Heston European call
pricer = srs.HestonPricer(
    s0=100, k=100, tau=1.0, r=0.03, q=0.0,
    v0=0.04, kappa=2.0, theta=0.04, sigma=0.3, rho=-0.5,
)
print("call =", pricer.price("call"))
g = pricer.greeks("call")
print(f"delta={g.delta:.4f}, vega={g.vega:.4f}")

More end-to-end recipes (Heston calibration, fBM Hurst estimation, vol-surface from quotes, Python interop) live in the tutorials section.

Benchmarks

FGN — CPU vs CUDA native (f32, H = 0.7)

cargo bench --features cuda-native --bench fgn_cuda_native

Single path:

nCPU sampleCUDA .on(Device::CudaNative).sample()Speedup
1,0248.1 µs46 µs0.18Ɨ
4,09635 µs84 µs0.42Ɨ
16,384147 µs110 µs1.3Ɨ
65,536850 µs227 µs3.7Ɨ

Batch:

n, mCPU sample_parCUDA .on(Device::CudaNative).sample_parSpeedup
4,096, 32147 µs117 µs1.3Ɨ
4,096, 5121.78 ms2.37 ms0.75Ɨ
65,536, 12812.6 ms10.5 ms1.2Ɨ
65,536, 1 k102 ms93 ms1.1Ɨ

CUDA wins for large n (≄ 16 k); CPU rayon dominates for medium n because of the GPU launch / transfer overhead.

Distribution sampling — Normal vs upstream rand_distr

Single-thread fill_slice, median of 7 runs (cargo bench --bench dist_multicore). Comparison column:

  • rand_distr + SimdRng — rand_distr::Normal consuming our SimdRng (same uniform stream, only the Normal algorithm differs).
  • rand_distr + rand::rng() — the out-of-box upstream pipeline.
nSimdNormal (µs)rand_distr + SimdRng (µs)speeduprand_distr + rand::rng() (µs)speedup
40.0080.0131.73Ɨ0.0324.22Ɨ
80.0140.0261.78Ɨ0.0654.52Ɨ
160.0290.0511.79Ɨ0.1284.47Ɨ
640.1090.2081.90Ɨ0.5084.64Ɨ
2560.4320.8401.94Ɨ2.0294.70Ɨ
4 0966.97513.1761.89Ɨ32.3824.64Ɨ
65 536113.458212.4061.87Ɨ520.2194.59Ɨ

Single-sample speedup vs prior release

Criterion dist.sample(rng) loop, vs the wide 1.3.0 baseline (cargo bench --bench distributions -- --baseline before):

distributionf32 / largef64 / largef64 / small
Uniform/simdāˆ’57% (ā‰ˆ 2.3Ɨ)āˆ’77% (ā‰ˆ 4.4Ɨ)āˆ’58% (ā‰ˆ 2.4Ɨ)
Normal/simdāˆ’51% (ā‰ˆ 2.0Ɨ)āˆ’75% (ā‰ˆ 4.0Ɨ)āˆ’63% (ā‰ˆ 2.7Ɨ)
Exp/simd N=64āˆ’3% (n.s.)āˆ’73% (ā‰ˆ 3.7Ɨ)—
LogNormal/simdāˆ’71% (ā‰ˆ 3.4Ɨ)āˆ’70% (ā‰ˆ 3.4Ɨ)āˆ’66% (ā‰ˆ 2.9Ɨ)

Driven by SIMD u64→f64 / u32→f32 magic-number conversion in SimdRng (direct-write fill_uniform_f64 / fill_uniform_f32 APIs that skip the [f64; 8] return-by-value round-trip), fused Exp(Ī») scaling inside fill_exp_scaled, and an 8-at-a-time main loop in fill_ziggurat so copy_from_slice inlines to stp stores instead of a memcpy call.

Opt-in: dual-stream RNG (dual-stream-rng feature)

[dependencies]
stochastic-rs = { version = "2.1", features = ["dual-stream-rng"] }

Unlocks SimdRngDual (two parallel xoshiro engines) and SimdNormalDual (Ziggurat unrolled 2Ɨ over the dual streams). Measured against the single-stream SimdNormal::fill_slice on Apple Silicon (cargo bench --bench dual_stream_compare --features dual-stream-rng):

nsingle (SimdNormal)dual (SimdNormalDual)Ī”
64111.6 ns105.5 nsāˆ’5.5%
256444.8 ns418.3 nsāˆ’6.0%
4 0967.43 µs6.60 µsāˆ’11.2%
65 536113.9 µs106.6 µsāˆ’6.4%
1 048 5761.83 ms1.70 msāˆ’6.8%

The win comes from hiding the 16 scalar kn / wn table-lookup latencies behind the second engine's xoshiro state update on a modern out-of-order core. Uniform fills are not bottlenecked on the engine so they see no speedup. Trade-off: SimdRngDual::from_seed does not reproduce SimdRng::from_seed's bit-exact sequence (statistical properties are identical and KS-validated).

Contributing

Contributions are welcome — bug reports, feature suggestions, or PRs. Open an issue or start a discussion on GitHub. Per-feature recipes (add-diffusion-process, adding-distribution, calibration-pattern, docs-writing, …) live under .claude/skills/.

License

MIT — see LICENSE.