Indiscernible Object Counting in Underwater Scenes (CVPR2023)

February 28, 2026 · View on GitHub

Paper, Supp, Poster, IOCfish5k Dataset, Code/models

Authors: Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan*, Luc Van Gool.

RGB-D Indiscernible Object Counting in Underwater Scenes (journal extension)

Paper, IOCfish5k-D Dataset, Code/models

Authors: Guolei Sun, Xiaogang Cheng, Zhaochong An, Xiaokang Wang, Yun Liu*, Deng-Ping Fan, Ming-Ming Cheng, Luc Van Gool.

1. Object Counting Tasks

The existing object counting tasks include: Generic Object Counting (GOC), and Dense Object Counting (DOC). In this paper, we propose a new challenge termed "Indiscernible Object Counting (IOC)", which focuses on counting foreground objects in indiscernible scenes. The comparisons between different tasks are shown in the following figure.

Figure 1: Illustration of different counting tasks. Top left: Generic Object Counting (GOC), which counts objects of various classes in natural scenes. Top right: Dense Object Counting (DOC), which counts objects of a foreground class in scenes packed with instances. Down: Indiscernible Object Counting (IOC), which counts objects of a foreground class in indiscernible scenes. Can you find all fishes in the given examples? For GOC, DOC, and IOC, the images shown are from PASCAL VOC, ShanghaiTech, and the new IOCfish5K dataset, respectively.

Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. Underwater scenes contain many indiscernible objects (Sea Horse, Reef Stonefish, Lionfish, and Leafy Sea Dragon) because of limited visibility and active mimicry. Hence, we focus on underwater scenes for our dataset.

2. The Proposed Datasets

The comparisons between our dataset and existing datasets are shown below.

Dataset	Year	Indiscernible Scene	#Ann. IMG	Avg. Resolution	Free View	Total Count	Min Count	Ave Count	Max Count	Web
UCSD	2008	✗	2,000	158x238	✓	49,885	11	25	46	Link
Mall	2012	✗	2,000	480x640	✗	62,325	13	31	53	Link
UCF_CC_50	2013	✗	50	2101x2888	✓	63,974	94	1,279	4,543	Link
WorldExpo'10	2016	✗	3,980	576x720	✗	199,923	1	50	253	Link
ShanghaiTech B	2016	✗	716	768x1024	✗	88,488	9	123	578	Link
ShanghaiTech A	2016	✗	482	589x868	✓	241,677	33	501	3,139	Link
UCF-QNRF	2018	✗	1,535	2013x2902	✓	1,251,642	49	815	12,865	Link
Crowd_surv	2019	✗	13,945	840x1342	✗	386,513	2	35	1420	Link
GCC (synthetic)	2019	✗	15,212	1080x1920	✗	7,625,843	0	501	3,995	Link
JHU-CROWD++	2019	✗	4,372	910x1430	✓	1,515,005	0	346	25,791	Link
NWPU-Crowd	2020	✗	5,109	2191x3209	✓	2,133,375	0	418	20,033	Link
NC4K	2021	✓	4,121	530x709	✓	4,584	1	1	8	Link
CAMO++	2021	✓	5,500	N/A	✓	32,756	N/A	6	N/A	Link
COD	2022	✓	5,066	737x964	✓	5,899	1	1	8	Link
IOCfish5K (Ours)	2023	✓	5,637	1080x1920	✓	659,024	0	117	2,371	Link
IOCfish5K-D (with depth maps, Ours)	2023	✓	5,637	1080x1920	✓	659,024	0	117	2,371	Link

Table 1: Statistics of existing datasets for dense object counting (DOC) and indiscernible object counting (IOC).

Our IOCfish5K dataset can be downloaded from here. It is organized as follows:

    IOCfish5K
    ├── images
        ├──****.jpg
        ├──****.jpg
    ├── annotations
        ├──****.xml
        ├──****.xml
    ├── train_id.txt
    ├── val_id.txt
    ├── test_id.txt

The image ids for train/val/test are in train_id.txt, val_id.txt, and test_id.txt, respectively.

The annotations are in xml format. Each object instance is annotated by a point (x,y coordinates). The point annotation in xml is as follows:

    <object>
        <point>
            <x>x_coor</x>
            <y>y_coor</y>
        </point>
    </object>

Our IOCfish5K-D dataset contains depth maps for all images in IOCfish5K. The depth maps can be downloaded from here.

Method	Publication	Val: MAE	Val: MSE	Val: NAE	Test: MAE	Test:MSE	Test:NAE
MCNN	CVPR'16	81.62	152.09	3.53	72.93	129.43	4.90
CSRNet	CVPR'18	43.05	78.46	1.91	38.12	69.75	2.48
LCFCN	ECCV'18	31.99	81.12	0.77	28.05	68.24	1.12
CAN	CVPR'19	47.77	83.67	2.10	42.02	74.46	2.58
DSSI-Net	ICCV'19	33.77	80.08	1.25	31.04	69.11	1.68
BL	ICCV'19	19.67	44.21	0.39	20.03	46.08	0.55
NoisyCC	NeurIPS'20	19.48	41.76	0.39	19.73	46.85	0.46
DM-Count	NeurIPS'20	19.65	42.56	0.42	19.52	45.52	0.55
GL	CVPR'21	18.13	44.57	0.33	18.80	46.19	0.47
P2PNet	ICCV'21	21.38	45.12	0.39	20.74	47.90	0.48
KDMG	TPAMI'22	22.79	47.32	0.90	22.79	49.94	1.17
MPS	ICASSP'22	34.68	59.46	2.06	33.55	55.02	2.61
MAN	CVPR'22	24.36	40.65	2.39	25.82	45.82	3.16
CLTR	ECCV'22	17.47	37.06	0.29	18.07	41.90	0.43
IOCFormer (Ours)	CVPR'23	15.91	34.08	0.26	17.12	41.25	0.38

5.1. Quantitative Results for multimodal methods

The results for various methods are shown below.

Method	Publication	Val: MAE	Val: MSE	Val: NAE	Test: MAE	Test:MSE	Test:NAE
RDNet	CVPR'19	25.79	60.85	1.29	25.27	56.69	1.43
IADM	CVPR'21	20.27	41.32	0.80	20.67	44.93	0.95
CSCA	ACCV'22	24.42	55.04	0.98	24.10	50.97	1.24
BM	ECCV'24	18.77	43.69	0.74	18.45	40.88	0.86
IOCFormer-D (Ours)	-	15.19	32.89	0.24	16.80	40.60	0.33

5.2. Qualitative Results

Qualitative comparisons of various algorithms (NoisyCC, MAN, CLTR, and ours). The GT or estimated counts for each case are shown in the lower left corner.

6. Citations

@inproceedings{sun2023ioc,
    title={Indiscernible Object Counting in Underwater Scenes},
    author={Sun, Guolei and An, Zhaochong and Liu, Yun and Liu, Ce and Sakaridis, Christos and Fan, Deng-Ping and Van Gool, Luc},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision and Patern Recognition (CVPR)},
    year={2023}
}

7. Contact

Guolei Sun, sunguolei.kaust@gmail.com

Indiscernible Object Counting in Underwater Scenes (CVPR2023)

RGB-D Indiscernible Object Counting in Underwater Scenes (journal extension)

1. Object Counting Tasks

2. The Proposed Datasets

3. Benchmarking

4. The Proposed Method

4.1. Overview

4.2. Usage

5. Results

5.1. Quantitative Results for unimodal methods

5.1. Quantitative Results for multimodal methods

5.2. Qualitative Results

6. Citations

7. Contact