RhythmGuassian: Repurposing Generalizable Gaussian Model for Remote Physiological Measurement
May 4, 2026 · View on GitHub
ICCV 2025 | Paper (CVF Open Access)
Official implementation of RhythmGuassian, which addresses the entanglement between motion / illumination interference and physiological signals in remote photoplethysmography (rPPG) by explicitly decoupling 4D chromatic and geometric components with a Generalizable Gaussian Model (GGM).
Highlight. The 4D Gaussian representation models geometry, motion, and chroma jointly under a single rendering formulation — without requiring real camera intrinsics/extrinsics, which the rPPG datasets do not provide.
Abstract
Remote Photoplethysmography (rPPG) enables non-contact extraction of physiological signals, providing significant advantages in medical monitoring, emotion recognition, and face anti-spoofing. However, the extraction of reliable rPPG signals is hindered by motion variations in real-world environments, leading to an entanglement issue. To address this challenge, we employ the Generalizable Gaussian Model (GGM) to disentangle geometry and chroma components with 4D Gaussian representations. Employing the GGM for robust rPPG estimation is non-trivial. Firstly, there are no camera parameters in the dataset, resulting in the inability to render video from 4D Gaussian. The "4D virtual camera" is proposed to construct extra Gaussian parameters to describe view and motion changes, giving the ability to render video with fixed virtual camera parameters. Further, the chroma component is still not explicitly decoupled in 4D Gaussian representation. Explicit Motion Modeling (EMM) is designed to decouple the motion variation in an unsupervised manner. Explicit Chroma Modeling (ECM) is tailored to decouple specular, physiological, and noise signals respectively.
Method
The pipeline takes an STMap derived from the input face video, encodes it through a shared backbone, and splits into two heads:
- Physiological Decoder — predicts BVP signal and HR.
- 4D Gaussian Adapter — predicts a 4D Gaussian map
M_gs ∈ R^{22 × N × T}consisting of:- depth
d_r ∈ R^1 - specular reflection
v_s ∈ R^3 - diffuse reflection (physiological)
v_d ∈ R^3 - motion noise
v_n ∈ R^3 - alpha
a ∈ R^1, scales ∈ R^3, rotationr ∈ R^3 - motion flow
[Δh, Δw, Δs] ∈ R^5
- depth
Final color: c = v_s + v_d + v_n. Final position: p = K E (ud, vd, d) rendered via the 4D Virtual Camera (face center as focal point, half face size as focal length, identity extrinsics).
Three core modules
- 4D Virtual Camera — fixes the missing camera parameters of rPPG datasets so 4D Gaussian splatting becomes applicable.
- Explicit Motion Modeling (EMM) — predicts a per-point motion flow
[Δh, Δw, Δs]and renders both the original pointspand motion-corrected pointsp̂ = K E ((u+1)d, (v + Δw + Δh·W)d, d). The corrected points are constrained to be free of chroma motion noise. - Explicit Chroma Modeling (ECM) — disentangles
v_d,v_s,v_nwith two unsupervised constraints:L_st— ContrastPhys-style spatio-temporal consistency onv_d, ensuring physiological signals are periodic and similar across face regions.L_m— negative Pearson between the modulus of motion flow and the modulus ofv_n, tying chroma motion noise to the predicted motion flow.
Total objective
L_all = λ_rec · L_rec + λ_st · L_st + λ_m · L_m + L_physio
L_physio follows the supervision protocol of NEST-rPPG; the three remaining terms are unsupervised, so RhythmGaussian also operates under the unsupervised protocol when L_physio is removed.
Datasets
We evaluate on the cross-domain rPPG benchmark from NEST-rPPG (VIPL, V4V, BUAA, UBFC, PURE) and additionally on MMPD, MR-NIRP, VV100, UCLA-rPPG, and Phys to cover diverse motion / illumination interferences. Pre-processed STMaps and labels follow the NEST-rPPG convention; see the NEST-rPPG repo for download and pre-processing details.
After download, organise the data root as:
$ROOT/
VIPL/<subject>/STMap/STMap_RGB_Align_CSI.png
V4V/<subject>/STMap/STMap_RGB.png
PURE/... STMap/STMap.png
BUAA/... STMap/STMap_RGB.png
UBFC/... STMap/STMap.png
MMPD/... STMap/STMap_RGB.png
MR-NIRP/... STMap/STMap_NIR.png
VV100/... STMap/STMap_RGB.png
UCLA-rPPG/... STMap/STMap_RGB.png
Phys/... STMap/STMap_RGB.png
STMap_Index/ # auto-generated on first run with -rD 1
Then update root_file in train.py:110 (or pass via env variable, see below) to point at $ROOT.
Installation
conda create -n rhythmgs python=3.10 -y
conda activate rhythmgs
# PyTorch (CUDA 11.8 example; pick a build matching your GPU/driver)
pip install torch==2.1.0 torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
# Differentiable Gaussian rasterizer — RhythmGuassian uses 2D Gaussian Splatting:
git clone --recursive https://github.com/hbb1/2d-gaussian-splatting.git
pip install ./2d-gaussian-splatting/submodules/diff-surfel-rasterization
Note on the import name.
models/gs.pyimports the rasterizer asdiff_gaussian_rasterization. If your build of 2DGS exposes the module asdiff_surfel_rasterizationinstead, either alias it in your environment (import diff_surfel_rasterization as diff_gaussian_rasterization) or change the import line at the top ofmodels/gs.pyto match.
Tested with PyTorch 2.1 / CUDA 11.8 on a single A100. Any GPU supported by the 2DGS rasterizer should work.
Train
Build the STMap index on first run (-rD 1), then drop the flag for subsequent runs:
# Target = VIPL (others are sources)
python train.py -g 0 -t VIPL -rD 1
# Resume / continue with the index already built
python train.py -g 0 -t VIPL -rD 0
To launch one of the per-target trainer variants, run them as a module so Python finds the package layout:
python -m datasets.MMPD -g 0 -t MMPD -rD 0
python -m datasets.Phys -g 0 -t Phys -rD 0
# ...same for MR, UCLA, VV100
Useful flags (see utils.py for the full list):
| flag | default | meaning |
|---|---|---|
-t, --tgt | VIPL | target domain (VIPL or V4V) |
-b, --batch-size | 32 | per-source batch size |
-mi, --max_iter | 20000 | total iterations |
--lambda_rec | 0.1 | weight of L_rec |
--lambda_st | 1.0 | weight of L_st |
--lambda_m | 1.0 | weight of L_m |
-sr, --spatial_aug_rate | 0.5 | spatial-shuffle augmentation rate |
-tr, --temporal_aug_rate | 0.1 | temporal-shift augmentation rate |
-k1 … -k10 | various | per-dataset balance weights for L_physio |
Logs are written to ./Result_log/, predictions to ./Result/, models to ./Result_Model/.
Unsupervised protocol
Set --lambda_rec --lambda_st --lambda_m only and zero out the L_physio weights (-k1 ... -k10 to 0); the model then trains entirely on the three unsupervised constraints.
Evaluation
For VIPL / V4V (HR head):
python Eval.py
For BUAA / PURE / UBFC (BVP head): clip-level BVP .mat files are saved during training; downstream HR / HRV evaluation uses the same Matlab pipeline as NEST-rPPG.
Repo Layout
4D-rPPG/
├── models/ # network modules
│ ├── model.py # BaseNet: ResNet18 encoder + physio head + 4D GS adapter
│ ├── gs.py # GaussianRenderer wrapping the 2DGS rasterizer
│ └── graphics_utils.py # 4D virtual-camera math (R, t, projection)
├── datasets/ # data loading + per-target trainer variants
│ ├── MyDataset.py # STMap dataset with spatial/temporal augmentation
│ ├── MMPD.py / MR.py / Phys.py / UCLA.py / VV100.py
│ # per-target trainers — `python -m datasets.<name>`
├── train.py # main entry — VIPL / V4V leave-one-out
├── Eval.py # per-video aggregation + final HR metrics
├── dataSort.py # split per-clip BVP outputs into per-subject files
├── MyLoss.py # P_loss3, SP_loss, ST_loss, M_loss, get_loss
├── utils.py # CLI args, Logger, MyEval (ME/STD/MAE/RMSE/MER/Pearson)
├── run.sh # example launcher
├── requirements.txt
├── LICENSE
└── README.md
Citation
If you find this work useful, please cite the ICCV 2025 paper:
@InProceedings{Lu_2025_ICCV,
author = {Lu, Hao and Zhang, Yuting and Tang, Jiaqi and Fu, Bowen and Ge, Wenhang and Wei, Wei and Wu, Kaishun and Chen, Yingcong},
title = {RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {20780-20790}
}
We also build on prior cross-domain rPPG work; please cite NEST-rPPG and DOHA if you use the benchmark protocol:
@inproceedings{Lu_2023_CVPR,
author = {Lu, Hao and Yu, Zitong and Niu, Xuesong and Chen, Ying-Cong},
title = {Neuron Structure Modeling for Generalizable Remote Physiological Measurement},
booktitle = {CVPR},
year = {2023}
}
@inproceedings{Sun_2023_DOHA,
author = {Sun, Weiyu and Zhang, Xinyu and Lu, Hao and Chen, Ying and Ge, Yun and Huang, Xiaolin and Yuan, Jie and Chen, Yingcong},
title = {Resolve Domain Conflicts for Generalizable Remote Physiological Measurement},
booktitle = {ACM MM},
year = {2023}
}
Acknowledgements
- The cross-domain rPPG benchmark, STMap pre-processing, and
L_physioformulation follow NEST-rPPG. - The differentiable Gaussian rasterizer used for the 4D virtual camera is from 2D Gaussian Splatting (Huang et al.).
models/graphics_utils.pyreuses camera-math helpers from the original Inria 3D-GS project. - The activation choices for RGB / alpha / scale / rotation follow LGM (Tang et al., 2024).
- The
L_stformulation borrows from ContrastPhys.
Contact
Issues and pull requests are welcome.