FactVC: Factual consistency for Video Captioning
December 26, 2023 · View on GitHub
This repository contains the data and code for the paper "Models See Hallucinations: Evaluating the Factuality in Video Captioning".
File Structure
FactVC-main/
├── data/
│ ├── activitynet/
│ │ ├── videos/ # sampled ActivityNet videos
│ │ ├── frames/ # extracted video frames
│ │ ├── captions/ # ground-truth and model-generated captions
│ │ ├── vids.txt # video ids
│ │ └── factuality_annotation.json # human factuality annotation
│ ├── youcook2/
│ │ ├── videos/ # sampled YouCook2 videos
│ │ ├── frames/ # extracted video frames
│ │ ├── captions/ # ground-truth and model-generated captions
│ │ ├── vids.txt # video ids
│ │ └── factuality_annotation.json # human factuality annotation
│ └── extract_frames.py
├── metric/
│ ├── clip/
│ ├── emscore/
│ └── factvc_corr.py # code to compute FactVC score and correlation
└── pretrained_models
└── factvc_video.pth # our pretrained metric model
Usage
First, download the sampled ActivityNet videos and YouCook2 videos and unzip them into corresponding folders. Download the pretrained FactVC metric model and put it under pretrained_models/ folder.
Then, extract video frames at 1fps (used for computing FactVC metric scores):
cd data/
python extract_frames.py --dataset activitynet
python extract_frames.py --dataset youcook2
Now, you can compute the FactVC scores and the correlation between FactVC score and human annotation:
cd metric/
python factvc_corr.py --dataset activitynet
python factvc_corr.py --dataset youcook2
Acknowledgements
We acknowledge the EMScore project that we based on our work