PRVQL: Progressive Knowledge-Guided Refinement for Robust Egocentric Visual Query Localization
July 10, 2025 ยท View on GitHub
๐ This work has been accepted to ICCV 2025!
๐ Overview
Egocentric Visual Query Localization (EgoVQL) aims to locate a target object in both space and time within first-person videos based on a given visual query. However, existing methods often struggle with significant appearance variations and cluttered backgrounds, leading to reduced localization accuracy.
To overcome these challenges, PRVQL introduces a progressive knowledge-guided refinement approach. By dynamically extracting and refining knowledge from the video itself, PRVQL continuously enhances query and video features across multiple stages, resulting in more accurate localization.
๐ Core Idea
PRVQL employs appearance and spatial knowledge extraction modules at each stage to iteratively refine the query and video features. This progressive refinement leads to increasingly accurate localization results.
๐๏ธ Model Framework
โ๏ธ Environment Setup
Set up your environment with the following commands:
conda create --name prvql python=3.8 -y
conda activate prvql
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt
๐ฆ Pretrained Weights
Download the pretrained model weights from Google Drive and place them in:
./output/ego4d_vq2d/train/train
๐ Dataset Preparation
1๏ธโฃ Process the Dataset
Follow the instructions in the VQLoC repository to process the dataset into video clips and images.
2๏ธโฃ Organize the Dataset
Ensure the dataset is structured as follows:
./your/dataset/path/
โโโ datav2
โโโ clips
โ โโโ 1.mp4
โ โโโ ...
โโโ images
โ โโโ 1
โ โ โโโ 1.mp4
โ โ โโโ ...
โ โโโ ...
โโโ train_annot.json
โโโ val_annot.json
โโโ vq_test_unannotated.json
โโโ vq_train.json
โโโ vq_val.json
3๏ธโฃ Update Configuration Files
Modify the dataset path in the following configuration files:
config/eval.yamlconfig/train.yamlconfig/val.yaml
Update the dataset root path:
root: './your/dataset/path/'
๐๏ธ Training & Evaluation
We will release the model and code soon.
๐ Benchmark Results on Ego4D Validation and Test Sets
๐ Validation Set
| Method | tAP | stAP | rec% | Succ |
|---|---|---|---|---|
| STARK (ICCV'21) | 0.10 | 0.04 | 12.41 | 18.70 |
| SiamRCNN (CVPR'22) | 0.22 | 0.15 | 32.92 | 43.24 |
| NFM (VQ2D'22) | 0.26 | 0.19 | 37.88 | 47.90 |
| CocoFormer (CVPR'23) | 0.26 | 0.19 | 37.67 | 47.68 |
| VQLoC (NeurIPS'23) | 0.31 | 0.22 | 47.05 | 55.89 |
| PRVQL (Ours) | 0.35 | 0.27 | 47.87 | 57.93 |
๐ Test Set
| Method | tAP | stAP | rec% | Succ |
|---|---|---|---|---|
| STARK (ICCV'21) | - | - | - | - |
| SiamRCNN (CVPR'22) | 0.20 | 0.13 | - | - |
| NFM (VQ2D'22) | 0.24 | 0.17 | - | - |
| CocoFormer (CVPR'23) | 0.25 | 0.18 | - | - |
| VQLoC (NeurIPS'23) | 0.32 | 0.24 | 45.11 | 55.88 |
| PRVQL (Ours) | 0.37 | 0.28 | 45.70 | 59.43 |
๐ Citation
If you find this repository useful, please consider starring โญ it and citing our work:
@article{fan2025prvql,
title={PRVQL: Progressive Knowledge-Guided Refinement for Robust Egocentric Visual Query Localization},
author={Fan, Bing and Feng, Yunhe and Tian, Yapeng and Lin, Yuewei and Huang, Yan and Fan, Heng},
journal={arXiv preprint arXiv:2502.07707},
year={2025}
}