[NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies

April 14, 2025 · View on GitHub

[NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies

This is the official repository for Hawk.

Jiaqi Tang^, Hao Lu^, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang,
Bin Guo, Jiangbo Lu, Qifeng Chen and Ying-Cong Chen*

^: Equal contribution. *: Corresponding Author.

📢 Updates

✅ Feb 25, 2025 - Huggingface Demo of Hawk is avaliable at HERE.
✅ Feb 25, 2025 - We release the training and demo code of Hawk.
✅ Feb 25, 2025 - We release the dataset (video + annotation) of Hawk. Check this Huggingface link for DOWNLOAD.
✅ Step 26, 2024 - Hawk is accepted by NeurIPS 2024.
✅ June 29, 2024 - We release the dataset (annotation) of Hawk. Check this Google Cloud link for DOWNLOAD.

🔍 Motivation - Have eyes like a Hawk!

🚩 Current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction.
🚩 Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios.

▶️ Getting Started

🪒 Installation

Create environment by following steps:

apt install ffmpeg
conda env create -f environment.yml
conda activate hawk

🏰 Pretrained and Fine-tuned Model

The following checkpoints are utilized to run Hawk：

Checkpoint Link Note
Video-LLaMA-2-7B-Finetuned link Used as initial weights for training.
Hawk_Pretrained link Pretrained on the WebViD
Hawk_Finetuned link Fine-tuned on Hawk dataset
If you want to use the pretrained model, please use the Hawk_Pretrained checkpoint.
If you wish to leverage the model for our anomaly understanding, please opt for the Hawk_Finetuned checkpoint.

Checkpoint	Link	Note
Video-LLaMA-2-7B-Finetuned	link	Used as initial weights for training.
Hawk_Pretrained	link	Pretrained on the WebViD
Hawk_Finetuned	link	Fine-tuned on Hawk dataset

🖥️ Training

🔨 Configuration

The configuration files for training including two stages.

Replace the following part with your own path:

llama_model: ".../Video-LLaMA-2-7B-Finetuned/llama-2-7b-chat-hf"

# The ckpt of the vision branch after stage1 pretrained, (only for stage 2)
ckpt: ".../checkpoint.pth"

🖥️ To Train

Then, run the script:

# for pretraining
NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port='10000' train.py --cfg-path  ./configs/train_configs/stage1_pretrain.yaml

# for fine-tuning
NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port='12001' train.py --cfg-path  ./configs/train_configs/stage2_finetune.yaml

Resource Usage: Training (stage 1 and stage 2): 4 * RTX A6000 48G

🌐 Citations

The following is a BibTeX reference:

@inproceedings{atang2024hawk,
  title = {Hawk: Learning to Understand Open-World Video Anomalies},
  author = {Tang, Jiaqi and Lu, Hao and Wu, Ruizheng and Xu, Xiaogang and Ma, Ke and Fang, Cheng and Guo, Bin and Lu, Jiangbo and Chen, Qifeng and Chen, Ying-Cong},
  year = {2024},
  booktitle = {Neural Information Processing Systems (NeurIPS)}
}

📧 Connecting with Us?

If you have any questions, please feel free to send email to jtang092@connect.hkust-gz.edu.cn.

This paper is supported by Guangdong Provincial Key Lab of Integrated Communication, Sensing and Computation for Ubiquitous Internet of Things (No.2023B1212010007), the Innovation and Technology Fund of HKSAR under grant number GHX/054/21GD, the Natural Science Foundation of Zhejiang Province, China, under No. LD24F020002, and National Science Fund for Distinguished Young Scholars (62025205).

Also, this project is inspired by Video-LLaMA.

[NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies

[NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies

This is the official repository for Hawk.

📢 Updates

🔍 Motivation - Have eyes like a Hawk!

▶️ Getting Started

🪒 Installation

🏰 Pretrained and Fine-tuned Model

⏳ Domo

🖥️ Training

💾 Dataset Preparation

🔨 Configuration

🖥️ To Train

🌐 Citations

📧 Connecting with Us?

📜 Acknowledgment