SHaRC Integration Guide

April 29, 2026 · View on GitHub

SHaRC algorithm integration doesn't require substantial modifications to the existing path tracer code. The core algorithm consists of three passes: Update, Resolve, and Render/Query. The first pass uses sparse tracing to populate the world-space radiance cache using the existing path tracer. The second pass resolves and combines newly accumulated data with data from previous frames. The final pass traces rays while querying the cache on hits to accelerate rendering through early termination.

Image 1. Path traced output at 1 path per pixel (left) and with SHaRC cache usage (right)

Integration Steps

An implementation of SHaRC using the RTXGI SDK needs to perform the following steps:

At Load-Time

Create the resources:

Hash entries buffer - structured buffer with 8-byte entries that store the hashes
Accumulation buffer - structured buffer with 16-byte entries that store accumulated radiance and sample counts per frame
Resolved buffer - structured buffer with 16-byte entries holding cross-frame accumulated radiance, total samples, and some extra data used in 'Resolve' pass

All buffers should contain the same number of entries, representing the number of scene voxels used for radiance caching. A solid baseline for most scenes can be the usage of $2^{22}$ elements. It is recommended to use power-of-two values. A higher element count is recommended for scenes with high depth complexity. A lower element count reduces memory pressure but increases the risk of hash collisions.

⚠️ Warning: All buffers should be initially cleared with '0'

At Render-Time

Populate cache data using sparse tracing against the scene
Combine new cache data with data accumulated from previous frames
Perform tracing with early path termination using cached data

Hash Grid Visualization

Hash grid visualization itself doesn’t require any GPU resources to be used. The simplest debug visualization uses world space position derived from the primary ray hit intersection.

HashGridParameters gridParameters;
gridParameters.cameraPosition = g_Constants.cameraPosition;
gridParameters.logarithmBase = SHARC_GRID_LOGARITHM_BASE;
gridParameters.sceneScale = g_Constants.sharcSceneScale;
gridParameters.levelBias = SHARC_GRID_LEVEL_BIAS;

float3 color = HashGridDebugColoredHash(positionWorld, gridParameters);

Image 2. SHaRC hash grid visualization

The logarithm base controls the distribution of detail levels and the ratio of voxel sizes between neighboring levels. It does not affect the average voxel size. To control voxel size use sceneScale parameter instead. HashGridParameters::levelBias controls level selection near the camera. It can be used either to add more near-camera levels (finer detail) or to clamp the minimum voxel size to avoid overly fine levels.

Implementation Details

Render Loop Change

Instead of the original trace call, we should have the following three passes with SHaRC:

SHaRC Update - RT call which updates the cache with the new data on each frame. Requires SHARC_UPDATE 1 shader define
SHaRC Resolve - Compute call which combines new cache data with data obtained on the previous frame
SHaRC Render/Query - RT call which traces scene paths and performs early termination using cached data. Requires SHARC_QUERY 1 shader define

Resource Binding

The SDK provides shader-side headers and code snippets that implement most of the steps above. Shader code should include SharcCommon.h, which already includes HashGridCommon.h.

SHaRC uses three main UAV buffers: hash entries, accumulation, and resolved data. These buffers are bound as UAVs in all passes, with different read/write usage depending on the pass (see table below).

Each pass requires UAV barriers to ensure that writes from the previous pass are visible before subsequent passes read or write the cache.

Render Pass	Hash Entries	Accumulation	Resolved	Lock Buffer*
SHaRC Update	RW	Write	Read	RW
SHaRC Resolve	RW	RW	RW
SHaRC Render	Read		Read

*Buffer is used if SHARC_ENABLE_64_BIT_ATOMICS is set to 0

SHaRC Update

⚠️ Warning: Requires SHARC_UPDATE 1 shader define

This pass runs a modified path tracer for a subset of screen pixels to populate the radiance cache. To reduce cost, only a fraction of pixels should be processed each frame (e.g., one random pixel per 5×5 block, ~4% of paths), with coverage distributed over time.

Each path segment (bounce) in the update pass is treated independently. For every new sample (path), call SharcInit().

On a miss event, call SharcUpdateMiss() and terminate the path. On a hit, evaluate radiance at the hit point and call SharcUpdateHit(). If SharcUpdateHit() returns false, the path can be terminated early.

After selecting a new ray direction, compute the segment throughput and pass it to SharcSetThroughput(). Once submitted, path throughput can be reset to 1.0, since each segment is accumulated independently. Accumulated radiance should also be reset per segment.

Positions should vary between frames to ensure full-screen coverage over time.

Figure 1. Path tracer loop during SHaRC Update pass

SHaRC Resolve

Resolve pass is performed using compute shader which runs SharcResolveEntry() for each element. This combines per-frame accumulation data with previously resolved data, performs temporal accumulation, handles stale entry eviction, and resets accumulation data for the next frame.

📝 Note: Check Resource Binding section for details on the required resources and their usage for each pass.

SharcResolveEntry() takes maximum number of accumulated frames as an input parameter to control the quality and responsiveness of the cached data. Larger values can increase quality but also increase response times. staleFrameNumMax parameter is used to control the lifetime of cached elements, it is used to control cache occupancy

⚠️ Warning: Small staleFrameNumMax values can negatively impact performance, SHARC_STALE_FRAME_NUM_MIN constant is used to prevent such behavior.

SHaRC Render

⚠️ Warning: Requires SHARC_QUERY 1 shader define.

During rendering with SHaRC cache usage we should try obtaining cached data using SharcGetCachedRadiance() on each eligible hit (typically excluding the primary hit). Upon success, the path tracing loop should be immediately terminated.

Figure 2. Path tracer loop during SHaRC Render pass

To avoid potential rendering artifacts certain aspects should be taken into account. If the path segment length is less than a voxel size(checked using GetVoxelSize()) we should continue tracing until the path segment is long enough to be safely usable. Unlike diffuse lobes, specular ones should be treated with care. For the glossy specular lobe, we can estimate its "effective" cone spread and if it exceeds the spatial resolution of the voxel grid, the cache can be used. Cone spread can be estimated as:

$\$ 2.0 * ray.length * sqrt(0.5 * a^2 / (1 - a^2))$$

where a is material roughness squared.

Responsive lighting

Responsive lighting mode is intended for light sources that require fast temporal response and may exist only for a short duration (e.g., flashlights or rapidly changing lights). In such cases, standard SHaRC accumulation may react too slowly, leading to visible lag in lighting updates.

This mode can be enabled by defining SHARC_ENABLE_RESPONSIVE_LIGHTING 1. Responsive lighting shares the same cache as regular lighting entries are stored together rather than in a separate structure. However, if a hit contains a responsive lighting component and is marked as responsive, the entire signal for that entry is treated as responsive and propagated accordingly through the cache.

When responsive lighting is enabled, the Resolve pass no longer clears accumulation buffer entries. Instead, a full resource clear is required to ensure the accumulation buffer is reset to zero before the Update pass begins.

Responsive signal processing introduces additional overhead and may reduce the benefits of long-term accumulation. It should only be enabled when the scene contains transient or rapidly changing light sources that require faster adaptation.

Parameters Selection and Debugging

By default, SHARC_PROPAGATION_DEPTH is set to 2 when cache resampling is enabled. This value can be increased if longer paths are required to reach light sources in the scene.

SHaRC radiance values are internally premultiplied with SHARC_RADIANCE_SCALE and accumulated using 32-bit integer representation per component.

SHaRC operates in world space, so HashGridParameters::sceneScale is generally independent of screen resolution and does not need to be adjusted with it.

During rendering, adding a debug heatmap of bounce count can help evaluate cache usage efficiency.

Image 3. Tracing depth heatmap with SHaRC off (left) and SHaRC on (right)
(green = 1 indirect bounce, red = 2+ indirect bounces)

💡 Tip: SharcCommon.h provides several methods to verify potential overflow in internal data structures. SharcDebugBitsOccupancySampleNum() and SharcDebugBitsOccupancyRadiance() can be used to verify consistency in the sample count and corresponding radiance values representation.

HashGridDebugOccupancy() should be used to validate cache occupancy. With a static camera around 10-20% of elements should be used on average, on fast camera movement the occupancy will go up. Increased occupancy can negatively impact performance, to control that we can increase the element count as well as decrease the threshold for the stale frames to evict outdated elements more aggressively.

Image 4. Debug overlay to visualize cache occupancy through HashGridDebugOccupancy()

Memory Usage

Hash entries buffer and two Voxel data buffers totally require 40 (8 + 16 + 16) bytes per voxel. For $2^{22}$ cache elements this will require 160 MiBs of video memory. Total number of elements may vary depending on the voxel size and scene scale. Larger buffer sizes may be needed to reduce potential hash collisions.