README.md

June 12, 2025 ยท View on GitHub

vHeat

vHeat: Building Vision Models upon Heat Conduction

(Accepted by CVPR 2025)

ZhaoZhi Wang1,2*, Yue Liu1*, Yunfan Liu1, Hongtian Yu1,

Yaowei Wang2,3, Qixiang Ye1,2, Yunjie Tian1

1 University of Chinese Academy of Sciences, 2 Peng Cheng Laboratory,

3 Harbin Institute of Technology (Shenzhen)

* Equal contribution.

Paper: (2405.16555)

Abstract

A fundamental problem in learning robust and expressive visual representations lies in efficiently estimating the spatial relationships of visual semantics throughout the entire image. In this study, we propose vHeat, a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field. The essential idea, inspired by the physical principle of heat conduction, is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy. This mechanism is incorporated into deep models through the newly proposed module, the Heat Conduction Operator (HCO), which is physically plausible and can be efficiently implemented using DCT and IDCT operations with a complexity of O(N1.5). Extensive experiments demonstrate that vHeat surpasses Vision Transformers (ViTs) across various vision tasks, while also providing higher inference speeds, reduced FLOPs, and lower GPU memory usage for high-resolution images.

Main Results

:book: Checkpoint and log files will be released soon

Classification on ImageNet-1K with vHeat

namepretrainresolutionacc@1#paramsFLOPsThroughputconfigs/logs/ckpts
Swin-TImageNet-1K224x22481.229M4.5G1244
Swin-SImageNet-1K224x22483.050M8.7G728
Swin-BImageNet-1K224x22483.589M15.4G458
vHeat-TImageNet-1K224x22482.229M4.6G1514config/log/ckpt
vHeat-SImageNet-1K224x22483.650M8.5G945config/log/ckpt
vHeat-BImageNet-1K224x22483.987M14.9G661config/log/ckpt
  • Models in this subsection are trained from scratch with random or manual initialization.
  • Throughput is test on pytorch2.0 + cuda12.1 + A100 + AMD EPYC 7542 CPU.
  • We use ema because our model is still under development.

Object Detection on COCO with vHeat

Backbone#paramsFLOPsDetectorbox mAPmask mAPconfigs/logs/ckpts
Swin-T48M267GMaskRCNN@1x42.739.3--
vHeat-T53M286GMaskRCNN@1x45.141.2config/log/ckpt
Swin-S69M354GMaskRCNN@1x44.840.9--
vHeat-S74M377GMaskRCNN@1x46.842.3config/log/ckpt
Swin-B107M496GMaskRCNN@1x46.942.3--
vHeat-B115M526GMaskRCNN@1x47.743.0config/log/ckpt
Swin-T48M267GMaskRCNN@3x46.041.6--
vHeat-T53M286GMaskRCNN@3x47.342.5config/log/ckpt
Swin-S69M354GMaskRCNN@3x48.243.2--
vHeat-S74M377GMaskRCNN@3x48.843.7config/log/ckpt
  • Models in this subsection are initialized from the models trained in classfication.

Semantic Segmentation on ADE20K with vHeat

BackboneInput#paramsFLOPsSegmentormIoU(SS)configs/logs/ckpts
Swin-T512x51260M945GUperNet@160k44.4--
vHeat-T512x51262M948GUperNet@160k47.0config/log/ckpt
Swin-S512x51281M1039GUperNet@160k47.6--
vHeat-S512x51282M1028GUperNet@160k49.0config/log/ckpt
Swin-B512x512121M1188GUperNet@160k48.1--
vHeat-B512x512129M1219GUperNet@160k49.6config/log/ckpt
  • Models in this subsection are initialized from the models trained in classfication.

Getting Started

Installation

Step 1: Clone the vHeat repository:

To get started, first clone the vHaet repository and navigate to the project directory:

git clone https://github.com/MzeroMiko/vHeat.git
cd vHeat

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n vHeat
conda activate vHeat

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Model Training and Inference

Classification

To train vHeat models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=16 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/to/dataset> --output /tmp

If you only want to test the performance (together with params and FLOPs):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/to/dataset> --output /tmp --resume </path/to/checkpoint> --eval --model_ema False

please refer to modelcard for more details.

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

For more information about detection and segmentation tasks, please refer to the manual of mmdetection and mmsegmentation. Remember to use the appropriate backbone configurations in the configs directory.

Before training on downstream tasks (detection/segmentation), please run interpolate4downstream.py to modify the classification pre-trained checkpoint to load for training.

Citation

@InProceedings{Wang_2025_CVPR,
    author    = {Wang, Zhaozhi and Liu, Yue and Tian, Yunjie and Liu, Yunfan and Wang, Yaowei and Ye, Qixiang},
    title     = {Building Vision Models upon Heat Conduction},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {9707-9717}
}