README.md

January 22, 2024 ยท View on GitHub

Learning Unseen Modality Interaction (NeurIPS 2023)

Yunhua Zhang, Hazel Doughty, Cees G.M. Snoek

Screenshot 2024-01-22 at 16 38 34

This is the demo code for the video classification task using EPIC-Kitchens, with RGB and audio modalities.

Demo Code

Environment

  • Python 3.8.5
  • torch 1.12.1+cu113
  • torchaudio 0.12.1+cu113
  • torchvision 0.13.1+cu113
  • mmcv-full 1.7.0

Dataset

We download the RGB and optical flow frames from the official website of EPIC-Kitchens, and extract the audio files ourselves from the videos by extract_audio.py.

Run Demo

  • We provide the splits for training, validation and testing in the epic-annotations folder.

  • To run the code: python train.py --lr 1e-1 --batch_size 96 --save_name 1e-1

  • We finetuned the model by reduced learning rates, as specified in bash.sh.