README.md

April 5, 2025 · View on GitHub

Preparation

We train Key Frame Selector based on GLUS. Please follow GLUS_README to prepare data and train the GLUS model.

Data annotation

Please follow Evaluation section of GLUS_README to get inference masks and predicted ious on Mevis_train and Youtube_train. An example

scripts/inference.sh
python utils/eval_mevis.py \
  --mevis_exp_path='$GLUS_ROOT/data/mevis/train/meta_expressions.json' \
  --mevis_mask_path='$GLUS_ROOT/data/mevis/train/mask_dict.json'
  --mevis_pred_path='$GLUS_ROOT/generated'
  --iou_path='$GLUS_ROOT/data/train_iou/mevis'

Run this in and predicted ious of train dataset will be saved to iou_path.

Use python kfs/utils/merge_folder.py to copy Mevis annotation and RefYoutube annotation into one folder.

Set the RVOS_ROOT in kfs/utils/dataset_config.py before next steps.

Model Weights

We use ChatUniVi as an init model. Follow the guidance to prepare for the pretrained weights of ChatUniVi:

  1. Download the pretrained weights of Chat-UniVi from Chat-UniVi-7b.
Then organize them in the following architecture:
WEIGHTS_ROOT
├── Chat-UniVi

We recommend to set WEIGHTS_ROOT to GLUS_ROOT/checkpoints.

Training

Set the paths in the scripts and then run kfs/scripts/train_kfs.sh. The scripts will automatically start the training, and transform the saved checkpoint into hugging-face format when the training finished.

Inference

Set the paths in kfs/scripts/inference_kfs.sh, and then run it. It will predict score for each frame in validation set and then choose the most confident (important) frame and save it in json files.

For final inference of RVOS task, please follow evaluation with key frame in GLUS_README.

Note:

  • We use intermediate saved checkpoint for data annotation (about 15-20 Epoch training of GLUS).
  • We only test KFS on Mevis and Ref-Youtube-VOS from GLUSS setting, feel free to test the pipeline on more dataset and models.

Todo list

  • Release KFS pipeline

  • Release KFS json for inference (GLUS_S)

  • Release KFS model checkpoint

Acknowledgement

This pipeline is built upon Chat-UniVi and VISA.