Create conda environment
March 13, 2026 Β· View on GitHub
SARNet: Go Closer to See Better π
Camouflaged Object Detection via Object Area Amplification and Figure-Ground Conversion
Highlights β’ Architecture β’ Results β’ Visualization β’ Quick Start β’ Citation
Figure 1: The proposed Search-Amplify-Recognize (SAR) paradigm. Unlike previous Search-Identify approaches, SARNet introduces an Amplify stage via OAA modules and a Recognize stage via FGC modules to progressively detect well-camouflaged objects.
π Why This Project?
New to Camouflaged Object Detection (COD)? You're in the right place!
SARNet is designed to be a beginner-friendly yet research-grade COD project. Whether you're a student exploring COD for the first time or a researcher looking for a solid baseline, this repo has everything you need:
- π Clean & Well-Commented Code β Every module is clearly documented, making it easy to understand the full pipeline from data loading to model inference.
- π¨ Ready-to-Use Visualization Tools β We open-source the scripts to generate feature map heatmaps and prediction overlays (see Visualization), so you can visually understand how the model works β not just look at numbers.
- π§© Modular Architecture β The OAA and FGC modules are self-contained and easy to plug into your own network for experimentation.
- π End-to-End Workflow β Training, inference, evaluation, and visualization are all included. Just clone, configure paths, and run!
- π Automatic Evaluation β Metrics (S-measure, F-measure, MAE, E-measure) are computed and saved to Excel automatically after inference.
π‘ If this is your first COD project, we recommend starting with the Quick Start section and then exploring the Visualization tools to build intuition about how camouflaged objects are detected.
β¨ Highlights
- π― Object Area Amplification (OAA) β Fuses adjacent-level features to amplify target region representations, enabling the network to "go closer" to camouflaged objects.
- π Figure-Ground Conversion (FGC) β Progressively refines predictions by selectively attending to foreground/background regions that deeper layers missed.
- π State-of-the-Art β Achieves competitive performance on 4 major COD benchmarks (CAMO, CHAMELEON, COD10K, NC4K).
- β‘ PVTv2 Backbone β Leverages Pyramid Vision Transformer V2 for powerful multi-scale feature extraction.
- π¨ Open-Source Visualization Tools β We provide ready-to-use scripts for feature map heatmap generation and prediction overlay visualization (see Visualization).
π Architecture
Figure 3: Overall architecture of SARNet. The PVTv2 backbone extracts multi-scale features, which are then processed by Object Area Amplification (OAA) modules to fuse and amplify target features. Figure-Ground Conversion modules (FFGC, EFGC) progressively refine predictions by attending to foreground/background regions.
Key Design Insights:
| Module | Role | Mechanism |
|---|---|---|
| OAA | Object Area Amplification | Fuses current-level & deeper features via dual-branch Conv+Upsample+Concat |
| FGC (bg mode) | Background-aware Refinement | Morphological dilation β prediction β attends to missed regions |
| FGC (fg mode) | Foreground-aware Refinement | Uses prediction map as attention weights for foreground enhancement |
| CBR | Channel Reduction | Conv β BatchNorm β ReLU on the deepest features |
π Results
Quantitative Comparison on COD Benchmarks
All metrics are reported using the same evaluation protocol. β means higher is better, β means lower is better.
| Dataset | S-measure β | weighted F β | MAE β | mean E β | mean F β |
|---|---|---|---|---|---|
| CAMO | 0.796 | 0.700 | 0.075 | 0.850 | 0.754 |
| CHAMELEON | 0.888 | 0.830 | 0.032 | 0.945 | 0.859 |
| COD10K | 0.815 | 0.667 | 0.037 | 0.886 | 0.720 |
| NC4K | 0.843 | 0.752 | 0.048 | 0.897 | 0.787 |
π‘ Please refer to the paper for full comparison tables with other methods.
Qualitative Comparison
Figure 6: Visual comparison with state-of-the-art methods. SARNet produces more accurate and complete segmentation masks, especially for objects with complex camouflage patterns. Our method effectively handles challenging cases such as small objects, objects with similar texture to the background, and multiple camouflaged instances.
π¨ Visualization
π’ We open-source all visualization tools used in the paper! You can reproduce the feature heatmaps and prediction overlays shown below using the provided scripts in the
display_heatmaps/directory.
Feature Map Heatmaps
Figure 7: Feature map visualization at different stages. The heatmaps demonstrate how OAA and FGC modules progressively focus on camouflaged objects. Warmer colors indicate higher activation, showing that deeper features attend to broader regions while refined features precisely localize object boundaries.
Generate feature map heatmaps with the open-source script:
cd display_heatmaps && python heatmap.py
The script loads intermediate feature maps, applies colormap transformations, and overlays heatmaps on the original images. See
display_heatmaps/heatmap.pyfor details.
Feature Visualization Analysis
Figure 8: Detailed feature visualization showing the effect of OAA and FGC modules. (a-b) Features before/after OAA demonstrate amplified object area attention. (c-d) Features before/after FGC show refined figure-ground separation.
Prediction Overlay
Overlay prediction maps on original images for qualitative analysis:
cd display_heatmaps && python combine.py
The script generates side-by-side comparisons of input images, ground truth masks, and model predictions. See
display_heatmaps/combine.pyfor details.
π¦ Pretrained Models & Prediction Maps
| Resource | Backbone | Download |
|---|---|---|
| Pretrained Model | PVTv2-B3 | |
| Prediction Maps | β |
π Quick Start
1. Environment Setup
# Clone the repository
git clone https://github.com/Haozhe-Xing/SARNet.git
cd SARNet
# Create conda environment
conda create -n sarnet python=3.8.13 -y
conda activate sarnet
# Install dependencies
pip install -r requirements.txt
Requirements: Python 3.8 Β· PyTorch Β· PVTv2 pretrained weights (download)
2. Dataset Preparation
Download COD10K and organize as:
<your_data_root>/
βββ COD10K/
βββ TrainDataset1/
β βββ Imgs/ # Training images (.jpg)
β βββ GT/ # Ground truth masks (.png)
βββ TestDataset/
βββ CHAMELEON/
β βββ Imgs/
β βββ GT/
βββ CAMO/
β βββ Imgs/
β βββ GT/
βββ COD10K/
β βββ Imgs/
β βββ GT/
βββ NC4K/
βββ Imgs/
βββ GT/
Then update the root path in config.py.
3. Training
python train.py
βοΈ Training Configuration (click to expand)
| Parameter | Default | Description |
|---|---|---|
pvt_name | pvt_v2_b3 | PVTv2 backbone variant |
args['scale'] | 384 | Input image resolution |
args['epoch_num'] | 100 | Number of training epochs |
args['lr'] | 1e-3 | Initial learning rate |
args['optimizer'] | SGD | Optimizer (SGD / Adam) |
args['train_batch_size'] | 2 | Training batch size |
args['lr_decay'] | 0.9 | Polynomial LR decay power |
4. Inference & Evaluation
python new_infer.py
Evaluation metrics (S-measure, weighted F-measure, MAE, E-measure, F-measure) are automatically saved to an Excel file.
5. Visualization
We provide open-source visualization tools for reproducing all visual results in the paper. See the Visualization section for detailed examples and instructions.
# Feature map heatmaps (reproduces Fig. 7 & Fig. 8 in the paper)
cd display_heatmaps && python heatmap.py
# Overlay predictions on images (reproduces visual comparisons)
cd display_heatmaps && python combine.py
π Project Structure
SARNet/
βββ SARNet.py # π§ Core network (OAA + FGC modules)
βββ pvtv2.py # 𦴠PVTv2 backbone encoder
βββ train.py # ποΈ Training pipeline
βββ new_infer.py # π Inference & evaluation
βββ config.py # βοΈ Path configuration
βββ datasets.py # π Dataset utilities
βββ loss.py # π Loss functions (Structure, IoU, Dice)
βββ joint_transforms.py # π Joint image-mask augmentations
βββ metric_caller.py # π Metric computation
βββ excel_recorder.py # π Excel metric recording
βββ misc.py # π§ Utility functions
βββ excel.py # π LaTeX table formatter
βββ requirements.txt # π Python dependencies
βββ display_heatmaps/ # π¨ Visualization tools
βββ heatmap.py # Feature map heatmap generation
βββ combine.py # Prediction overlay visualization
π Citation
If you find this work helpful for your research, please consider citing our paper and giving a β:
@article{xing2023go,
title = {Go Closer to See Better: Camouflaged Object Detection via Object Area Amplification and Figure-Ground Conversion},
author = {Xing, Haozhe and Wang, Haiyu and Li, Yanye and Ling, Haibin},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
volume = {33},
number = {10},
pages = {5595--5608},
year = {2023},
publisher = {IEEE},
doi = {10.1109/TCSVT.2023.3255304}
}
π Acknowledgments
We sincerely thank the following open-source projects:
- PVTv2 β Pyramid Vision Transformer backbone
- PFNet β Loss functions and utility code
- py-sod-metrics β Evaluation metrics library
π¬ Contact
If you have any questions, please feel free to open an issue or contact us.
If you find this project useful, please consider giving it a β.
It helps others discover this work!