Indiscernible Object Counting in Underwater Scenes (CVPR2023)

February 28, 2026 · View on GitHub

Paper, Supp, Poster, IOCfish5k Dataset, Code/models

Authors: Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan*, Luc Van Gool.

RGB-D Indiscernible Object Counting in Underwater Scenes (journal extension)

Paper, IOCfish5k-D Dataset, Code/models

Authors: Guolei Sun, Xiaogang Cheng, Zhaochong An, Xiaokang Wang, Yun Liu*, Deng-Ping Fan, Ming-Ming Cheng, Luc Van Gool.

1. Object Counting Tasks

The existing object counting tasks include: Generic Object Counting (GOC), and Dense Object Counting (DOC). In this paper, we propose a new challenge termed "Indiscernible Object Counting (IOC)", which focuses on counting foreground objects in indiscernible scenes. The comparisons between different tasks are shown in the following figure.


Figure 1: Illustration of different counting tasks. Top left: Generic Object Counting (GOC), which counts objects of various classes in natural scenes. Top right: Dense Object Counting (DOC), which counts objects of a foreground class in scenes packed with instances. Down: Indiscernible Object Counting (IOC), which counts objects of a foreground class in indiscernible scenes. Can you find all fishes in the given examples? For GOC, DOC, and IOC, the images shown are from PASCAL VOC, ShanghaiTech, and the new IOCfish5K dataset, respectively.

Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. Underwater scenes contain many indiscernible objects (Sea Horse, Reef Stonefish, Lionfish, and Leafy Sea Dragon) because of limited visibility and active mimicry. Hence, we focus on underwater scenes for our dataset.

2. The Proposed Datasets

The comparisons between our dataset and existing datasets are shown below.

DatasetYearIndiscernible Scene#Ann. IMGAvg. ResolutionFree ViewTotal CountMin CountAve CountMax CountWeb
UCSD20082,000158x23849,885112546Link
Mall20122,000480x64062,325133153Link
UCF_CC_502013502101x288863,974941,2794,543Link
WorldExpo'1020163,980576x720199,923150253Link
ShanghaiTech B2016716768x102488,4889123578Link
ShanghaiTech A2016482589x868241,677335013,139Link
UCF-QNRF20181,5352013x29021,251,6424981512,865Link
Crowd_surv201913,945840x1342386,5132351420Link
GCC (synthetic)201915,2121080x19207,625,84305013,995Link
JHU-CROWD++20194,372910x14301,515,005034625,791Link
NWPU-Crowd20205,1092191x32092,133,375041820,033Link
NC4K20214,121530x7094,584118Link
CAMO++20215,500N/A32,756N/A6N/ALink
COD20225,066737x9645,899118Link
IOCfish5K (Ours)20235,6371080x1920659,02401172,371Link
IOCfish5K-D (with depth maps, Ours)20235,6371080x1920659,02401172,371Link

Table 1: Statistics of existing datasets for dense object counting (DOC) and indiscernible object counting (IOC).

Our IOCfish5K dataset can be downloaded from here. It is organized as follows:

    IOCfish5K
    ├── images
        ├──****.jpg
        ├──****.jpg
    ├── annotations
        ├──****.xml
        ├──****.xml
    ├── train_id.txt
    ├── val_id.txt
    ├── test_id.txt

The image ids for train/val/test are in train_id.txt, val_id.txt, and test_id.txt, respectively.

The annotations are in xml format. Each object instance is annotated by a point (x,y coordinates). The point annotation in xml is as follows:

    <object>
        <point>
            <x>x_coor</x>
            <y>y_coor</y>
        </point>
    </object>

Our IOCfish5K-D dataset contains depth maps for all images in IOCfish5K. The depth maps can be downloaded from here.

3. Benchmarking

For benchmarking purposes on IOCfish5K, we select 14 mainstream unimodal methods for object counting and carefully evaluate them.

For benchmarking purposes on IOCfish5K-D, we select 4 mainstream multimodal methods for object counting and carefully evaluate them.

4. The Proposed Method

4.1. Overview

we propose IOCFormer, a new strong baseline that combines density and regression branches in a unified framework and can effectively tackle object counting under concealed scenes.

4.2. Usage

For training/inference, please go to here.

5. Results

5.1. Quantitative Results for unimodal methods

The results for various methods are shown below.

MethodPublicationVal: MAEVal: MSEVal: NAETest: MAETest:MSETest:NAE
MCNNCVPR'1681.62152.093.5372.93129.434.90
CSRNetCVPR'1843.0578.461.9138.1269.752.48
LCFCNECCV'1831.9981.120.7728.0568.241.12
CANCVPR'1947.7783.672.1042.0274.462.58
DSSI-NetICCV'1933.7780.081.2531.0469.111.68
BLICCV'1919.6744.210.3920.0346.080.55
NoisyCCNeurIPS'2019.4841.760.3919.7346.850.46
DM-CountNeurIPS'2019.6542.560.4219.5245.520.55
GLCVPR'2118.1344.570.3318.8046.190.47
P2PNetICCV'2121.3845.120.3920.7447.900.48
KDMGTPAMI'2222.7947.320.9022.7949.941.17
MPSICASSP'2234.6859.462.0633.5555.022.61
MANCVPR'2224.3640.652.3925.8245.823.16
CLTRECCV'2217.4737.060.2918.0741.900.43
IOCFormer (Ours)CVPR'2315.9134.080.2617.1241.250.38

5.1. Quantitative Results for multimodal methods

The results for various methods are shown below.

MethodPublicationVal: MAEVal: MSEVal: NAETest: MAETest:MSETest:NAE
RDNetCVPR'1925.7960.851.2925.2756.691.43
IADMCVPR'2120.2741.320.8020.6744.930.95
CSCAACCV'2224.4255.040.9824.1050.971.24
BMECCV'2418.7743.690.7418.4540.880.86
IOCFormer-D (Ours)-15.1932.890.2416.8040.600.33

5.2. Qualitative Results

Qualitative comparisons of various algorithms (NoisyCC, MAN, CLTR, and ours). The GT or estimated counts for each case are shown in the lower left corner.


6. Citations

@inproceedings{sun2023ioc,
    title={Indiscernible Object Counting in Underwater Scenes},
    author={Sun, Guolei and An, Zhaochong and Liu, Yun and Liu, Ce and Sakaridis, Christos and Fan, Deng-Ping and Van Gool, Luc},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision and Patern Recognition (CVPR)},
    year={2023}
}

7. Contact