HyperHessians.jl

February 27, 2026 · View on GitHub

CI codecov code style: runic

HyperHessians.jl is a package to compute hessians using forward mode automatic differentiation.

It works similar to ForwardDiff.hessian but should have better run-time and compile-time performance in all cases.

API Summary

Methods

FunctionDescription
hessian(f, x[, h_cfg])Compute Hessian of f at x (scalar or vector)
hessian!(H, f, x, h_cfg)In-place Hessian into pre-allocated H
hessian_gradient_value(f, x[, h_cfg])Compute Hessian, gradient, and value together
hessian_gradient_value!(H, G, f, x, h_cfg)In-place variant, returns the value
hvp(f, x, tangents[, hvp_cfg])Hessian–vector product(s) H(x) * tangent(s)
hvp!(hvs, f, x, tangents[, hvp_cfg])In-place Hessian–vector product
hvp_gradient_value(f, x, tangents[, hvp_cfg])Value, gradient, and Hessian–vector product(s)
hvp_gradient_value!(hvs, g, f, x, tangents[, hvp_cfg])In-place Hessian–vector product and gradient, returns value
vhvp(f, x, v[, vhvp_cfg])Directional second derivative v' * H(x) * v
vhvp_gradient_value(f, x, v[, vhvp_cfg])Value, directional gradient, and v' * H(x) * v

Types

TypeDescription
HessianConfig(x, chunk)Config for Hessian calls
HVPConfig(x, chunk)Config for Hessian–vector products
VHVPConfig(x, v)Config for vector-Hessian-vector products
Chunk{N}()Specify chunk size N

Usage

Basic

To compute the hessian of a function f w.r.t the vector x the quick and dirty way is to call HyperHessians.hessian(f, x):

julia> x = [1.0,2.0,3.0,4.0];

julia> HyperHessians.hessian(x->sum(cumprod(x)), x)
4×4 Matrix{Float64}:
  0.0  16.0  10.0  6.0
 16.0   0.0   5.0  3.0
 10.0   5.0   0.0  2.0
  6.0   3.0   2.0  0.0

This also works for scalar functions:

julia> f(x) = exp(x) / sqrt(sin(x)^3 + cos(x)^3);

julia> HyperHessians.hessian(f, 2.0)
82.55705026089272

When the input is a vector, the basic usage will, however, not give the best performance.

Advanced

For best performance we want to do the following things:

  1. Cache the input array of custom numbers that HyperHessians uses so they can be reused upon multiple calls to hessian.
  2. Decide on a good "chunk size" which is roughly the size of the section of the hessian computed for every call to the function.
  3. Pre-allocate the output hessian matrix.

Step 1 and 2 are done by creating a HessianConfig object:

x = rand(32); # input array
chunk_size = HyperHessians.Chunk{8}() # chosen chunk size
cfg = HyperHessians.HessianConfig(x, chunk_size) # creating the config object

The larger the chunk size the larger part of the Hessian is computed on every call to f (if the chunk size is equal to the input vector, the whole hessian is computed in one call to f). However, with a larger chunk size the special numbers HyperHessians uses become larger and if they become too large this can lead to inefficient execution.

A choice of a chunk size is, therefore, a trade-off and the optimal one is likely to be dependent on the particular function getting differentiated. A decent overall choice seems to be a chunk size of 8. It is also in general a good idea to pick a chunk size as a multiple of 4 to use SIMD effectively.

The chunk_size argument can be left out and HyperHessians will try to determine a reasonable choice.

If the chunk size c is smaller than the input vector with length n, the function will be called k = ceil(Int, n / c); k(k+1)÷2 times, each time computing a part of the hessian:

julia> mysum(x) = (@info "got called"; sum(x));

julia> x = rand(8); n = length(x); c = 4;

julia> cfg = HyperHessians.HessianConfig(x, HyperHessians.Chunk{c}());

julia> HyperHessians.hessian(mysum, x, cfg)
[ Info: got called
[ Info: got called
[ Info: got called

julia> k=ceil(Int, n / c); k*(k+1)÷2
3

Finally, it is also a good idea to allocate the output hessian and use the in-place hessian! function instead. Putting it all together, the result would look something like this:

julia> x = rand(8);

julia> H = similar(x, 8, 8);

julia> cfg = HyperHessians.HessianConfig(x, HyperHessians.Chunk{8}());

julia> HyperHessians.hessian!(H, f, x, cfg)

Computing Hessian, Gradient, and Value Together

Use hessian_gradient_value to compute all three in a single pass:

julia> f = x -> sum(x.^3);

julia> x = [1.0, 2.0, 3.0, 4.0];

julia> result = HyperHessians.hessian_gradient_value(f, x);

julia> result.value
100.0

julia> result.gradient
4-element Vector{Float64}:
  3.0
 12.0
 27.0
 48.0

julia> result.hessian
4×4 Matrix{Float64}:
 6.0   0.0   0.0   0.0
 0.0  12.0   0.0   0.0
 0.0   0.0  18.0   0.0
 0.0   0.0   0.0  24.0

There is also an in-place variant hessian_gradient_value!(H, G, f, x, cfg) that returns the value.

Hessian–Vector Products

Use hvp(f, x, v) (or hvp(f, x, (v,))[1]) to get H(x) * v without forming the full matrix:

julia> f = x -> sum(sin.(x) .* cos.(x));

julia> x = [1.0, 2.0, 3.0];

julia> v = [0.1, 0.2, 0.3];

julia> HyperHessians.hvp(f, x, v)
3-element Vector{Float64}:
 -0.18185948536513638
  0.3027209981231713
  0.1676492989193555

julia> HyperHessians.hessian(f, x) * v
3-element Vector{Float64}:
 -0.18185948536513638
  0.3027209981231713
  0.1676492989193555

You can pass a single tangent as a vector, or multiple tangents as a tuple of vectors. HyperHessians will evaluate bundled tangents in one pass and return the matching vector or tuple of Hessian–vector products. You can supply a HVPConfig to reuse storage and tune chunk size, e.g. cfg = HyperHessians.HVPConfig(x, v, HyperHessians.Chunk{8}()); hvp(f, x, v, cfg) (for a single tangent this is equivalent to HVPConfig(x, HyperHessians.Chunk{8}()), passing v only matters when you have multiple tangents). For non-allocating use, call hvp!(hvs, f, x, tangents[, cfg]) with a preallocated output container (a vector for a single tangent, or a tuple of vectors for multiple) and a reused config.

To get the value, gradient, and Hessian–vector product(s) together, use hvp_gradient_value/hvp_gradient_value!; this reuses the same directional seeding so the value and gradient come "for free" alongside the bundled H*v results.

For the quadratic form v' * H(x) * v, use vhvp which differentiates along the line x + t*v to avoid forming the intermediate H*v:

julia> HyperHessians.vhvp(f, x, v)  # equal to dot(v, hvp(f, x, v))
0.09265304076392727

For allocation-free repeated calls, use a preallocated VHVPConfig:

julia> cfg = HyperHessians.VHVPConfig(x, v);

julia> HyperHessians.vhvp(f, x, v, cfg)
0.09265304076392727

StaticArrays support

HyperHessians provides a non-allocating extension for StaticVector inputs:

julia> using StaticArrays

julia> x = @SVector rand(4);

julia> HyperHessians.hessian(x -> sum(abs2, x), x) isa SMatrix
true

julia> HyperHessians.hvp(x -> sum(sin.(x)), x, ones(SVector{4})) isa SVector
true

hessian, hessian_gradient_value, and hvp return StaticArrays (for hvp, a StaticVector for one tangent or a tuple for multiple) and avoid heap allocations; bang-variants are intentionally omitted for StaticArrays because mutation is less common for immutable small arrays.

Performance

To get an estimate of the performance of HyperHessians we here benchmark it against the ForwardDiff.jl package for two common benchmark functions rosenbrock and ackley. The results can be reproduced with benchmark/fdiff.jl.

Functioninput lengthTime ForwardDiffTime HyperHessiansSpeedup
ackley154.053 ns36.725 ns1.5
ackley8751.448 ns461.276 ns1.6
ackley1282.085 ms957.194 μs2.2
rosenbrock_1124.666 ns7.436 ns3.3
rosenbrock_18967.438 ns436.330 ns2.2
rosenbrock_11283.175 ms1.031 ms3.1