stochastic-rs
June 6, 2026 Ā· View on GitHub
stochastic-rs
A high-performance Rust library for stochastic process simulation,
quantitative finance, statistics, copulas, distributions, and
neural-network volatility surrogates. Generic over f32 / f64, with
SIMD acceleration on CPU and CUDA / Metal / Accelerate / cubecl backends
where they pay off, and first-class Python bindings via PyO3.
Documentation
š stochastic.rust-dd.com ā full docs site (Fumadocs + Next.js, deployed on Vercel).
Highlights:
- 120+ stochastic processes ā diffusion, jump, fractional / rough,
short-rate, HJM, LMM, fBM, Hawkes, LƩvy. Generic-precision
ProcessExt<T>impl, SIMD on CPU, optional CUDA / Metal for FGN / fBM. - Pricing & calibration ā closed-form (BSM, Bachelier, Black76, Bjerksund-Stensland, ā¦), Fourier (Heston / Bates / Merton-jump / Kou / VG / CGMY / HKDE / double-Heston), Monte Carlo (basket, rainbow, cliquet, autocallable, spread), finite difference, Bermudan LSM, Heston SLV. Heston / SABR / SVJ / LĆ©vy / rough Bergomi / double-Heston / Hull-White swaption-grid calibrators.
- Statistics & risk ā Hurst (Fukasawa), MLE for 1-D diffusions with 6 transition densities, ADF / KPSS / Phillips-Perron, realised variance with BNHLS bandwidth, HMM, changepoint, particle filter, UKF. VaR / CVaR / drawdown, Sharpe / Sortino / IR / Calmar.
- Fixed income & credit ā yield-curve bootstrapping, Nelson-Siegel / Svensson, multi-curve, IRS / inflation swaps, Vasicek / CIR / Hull-White / G2++ short-rate engines, Merton structural model, reduced-form survival curves, CDS pricing, JLT migration matrices.
- Microstructure ā Almgren-Chriss, Kyle (1985), Bouchaud propagator, full price-time priority order book.
- Distributions & copulas ā 19 SIMD distributions with closed-form pdf / cdf / cf / moments. Clayton / Frank / Gumbel / Independence bivariate; Gaussian / vine multivariate.
- Python bindings ā 210 entries (198 PyO3 classes + 12 functions) spanning every sub-crate except AI surrogates. Numpy-in / numpy-out.
Installation
Rust
[dependencies]
stochastic-rs = "2.0.0"
use stochastic_rs::prelude::*;
use stochastic_rs::stochastic::diffusion::gbm::Gbm;
use stochastic_rs::quant::pricing::heston::HestonPricer;
For per-sub-crate (lean) builds, OpenBLAS / CUDA / Metal / cubecl / Accelerate feature flags, native CPU optimisation, and SIMD details, see the installation guide on the docs site.
Python
pip install stochastic-rs
Source build (requires the Rust toolchain):
pip install maturin
maturin develop --release --manifest-path stochastic-rs-py/Cargo.toml
Linux (x86_64 / aarch64) and macOS (arm64 / x86_64) wheels ship with
the openblas feature on. The Windows wheel omits the 15
BLAS-backed classes; everything else (ā195 classes / 12 functions)
works identically. See the
Python bindings page for the parity
table and the source-build path with vcpkg.
Quickstart
use stochastic_rs::prelude::*;
use stochastic_rs::simd_rng::Unseeded;
use stochastic_rs::stochastic::diffusion::ou::Ou;
use stochastic_rs::quant::pricing::heston::HestonPricer;
fn main() {
// Mean-reverting Ornstein-Uhlenbeck path: Ou::new(theta, mu, sigma, n, x0, t, seed)
let ou = Ou::<f64>::new(2.0, 0.0, 1.0, 1_000, Some(0.0), Some(1.0), Unseeded);
let path = ou.sample();
println!("OU path points: {}", path.len());
// Heston (1993) European option, closed form. HestonPricer::new args:
// s, v0, k, r, q, rho, kappa, theta, sigma, lambda, tau, eval, expiration
let pricer = HestonPricer::new(
100.0, 0.04, 100.0, 0.03, Some(0.0),
-0.5, 2.0, 0.04, 0.3, Some(0.0),
Some(1.0), None, None,
);
let (call, put) = pricer.calculate_call_put();
println!("call={call:.4}, put={put:.4}");
}
import stochastic_rs as srs
# Mean-reverting OU path
p = srs.Ou(theta=2.0, mu=0.0, sigma=1.0, n=1000, x0=0.0, t=1.0)
path = p.sample() # numpy.ndarray, shape (1000,)
# Heston European call
pricer = srs.HestonPricer(
s0=100, k=100, tau=1.0, r=0.03, q=0.0,
v0=0.04, kappa=2.0, theta=0.04, sigma=0.3, rho=-0.5,
)
print("call =", pricer.price("call"))
g = pricer.greeks("call")
print(f"delta={g.delta:.4f}, vega={g.vega:.4f}")
More end-to-end recipes (Heston calibration, fBM Hurst estimation, vol-surface from quotes, Python interop) live in the tutorials section.
Benchmarks
FGN ā CPU vs CUDA native (f32, H = 0.7)
cargo bench --features cuda-native --bench fgn_cuda_native
Single path:
| n | CPU sample | CUDA .on(Device::CudaNative).sample() | Speedup |
|---|---|---|---|
| 1,024 | 8.1 µs | 46 µs | 0.18à |
| 4,096 | 35 µs | 84 µs | 0.42à |
| 16,384 | 147 µs | 110 µs | 1.3à |
| 65,536 | 850 µs | 227 µs | 3.7à |
Batch:
| n, m | CPU sample_par | CUDA .on(Device::CudaNative).sample_par | Speedup |
|---|---|---|---|
| 4,096, 32 | 147 µs | 117 µs | 1.3à |
| 4,096, 512 | 1.78 ms | 2.37 ms | 0.75Ć |
| 65,536, 128 | 12.6 ms | 10.5 ms | 1.2Ć |
| 65,536, 1 k | 102 ms | 93 ms | 1.1Ć |
CUDA wins for large n (ā„ 16 k); CPU rayon dominates for medium n
because of the GPU launch / transfer overhead.
Distribution sampling ā Normal vs upstream rand_distr
Single-thread fill_slice, median of 7 runs (cargo bench --bench dist_multicore). Comparison column:
rand_distr + SimdRngārand_distr::Normalconsuming ourSimdRng(same uniform stream, only the Normal algorithm differs).rand_distr + rand::rng()ā the out-of-box upstream pipeline.
| n | SimdNormal (µs) | rand_distr + SimdRng (µs) | speedup | rand_distr + rand::rng() (µs) | speedup |
|---|---|---|---|---|---|
| 4 | 0.008 | 0.013 | 1.73Ć | 0.032 | 4.22Ć |
| 8 | 0.014 | 0.026 | 1.78Ć | 0.065 | 4.52Ć |
| 16 | 0.029 | 0.051 | 1.79Ć | 0.128 | 4.47Ć |
| 64 | 0.109 | 0.208 | 1.90Ć | 0.508 | 4.64Ć |
| 256 | 0.432 | 0.840 | 1.94Ć | 2.029 | 4.70Ć |
| 4 096 | 6.975 | 13.176 | 1.89Ć | 32.382 | 4.64Ć |
| 65 536 | 113.458 | 212.406 | 1.87Ć | 520.219 | 4.59Ć |
Single-sample speedup vs prior release
Criterion dist.sample(rng) loop, vs the wide 1.3.0 baseline
(cargo bench --bench distributions -- --baseline before):
| distribution | f32 / large | f64 / large | f64 / small |
|---|---|---|---|
Uniform/simd | ā57% (ā 2.3Ć) | ā77% (ā 4.4Ć) | ā58% (ā 2.4Ć) |
Normal/simd | ā51% (ā 2.0Ć) | ā75% (ā 4.0Ć) | ā63% (ā 2.7Ć) |
Exp/simd N=64 | ā3% (n.s.) | ā73% (ā 3.7Ć) | ā |
LogNormal/simd | ā71% (ā 3.4Ć) | ā70% (ā 3.4Ć) | ā66% (ā 2.9Ć) |
Driven by SIMD u64āf64 / u32āf32 magic-number conversion in SimdRng
(direct-write fill_uniform_f64 / fill_uniform_f32 APIs that skip the
[f64; 8] return-by-value round-trip), fused Exp(Ī») scaling inside
fill_exp_scaled, and an 8-at-a-time main loop in fill_ziggurat so
copy_from_slice inlines to stp stores instead of a memcpy call.
Opt-in: dual-stream RNG (dual-stream-rng feature)
[dependencies]
stochastic-rs = { version = "2.1", features = ["dual-stream-rng"] }
Unlocks SimdRngDual (two parallel xoshiro engines) and SimdNormalDual
(Ziggurat unrolled 2Ć over the dual streams). Measured against the
single-stream SimdNormal::fill_slice on Apple Silicon
(cargo bench --bench dual_stream_compare --features dual-stream-rng):
| n | single (SimdNormal) | dual (SimdNormalDual) | Ī |
|---|---|---|---|
| 64 | 111.6 ns | 105.5 ns | ā5.5% |
| 256 | 444.8 ns | 418.3 ns | ā6.0% |
| 4 096 | 7.43 µs | 6.60 µs | ā11.2% |
| 65 536 | 113.9 µs | 106.6 µs | ā6.4% |
| 1 048 576 | 1.83 ms | 1.70 ms | ā6.8% |
The win comes from hiding the 16 scalar kn / wn table-lookup latencies
behind the second engine's xoshiro state update on a modern out-of-order
core. Uniform fills are not bottlenecked on the engine so they see no
speedup. Trade-off: SimdRngDual::from_seed does not reproduce
SimdRng::from_seed's bit-exact sequence (statistical properties are
identical and KS-validated).
Contributing
Contributions are welcome ā bug reports, feature suggestions, or PRs.
Open an issue or start a discussion on GitHub. Per-feature recipes
(add-diffusion-process, adding-distribution, calibration-pattern,
docs-writing, ā¦) live under .claude/skills/.
License
MIT ā see LICENSE.