Preparing Datasets for InstructSeg

December 20, 2024 ยท View on GitHub

Expected dataset structure for COCO:

coco/
  train2014/
    # image files

Expected dataset structure for RES:

RES/
    refcoco/
        refcoco_train.json
        refcoco_val.json
        refcoco_testA.json
        refcoco_testB.json
    refcoco+/
        refcoco+_train.json
        refcoco+_val.json
        refcoco+_testA.json
        refcoco+_testB.json
    refcocog/
        refcocog_train.json
        refcocog_val.json
        refcocog_test.json

Expected dataset structure for ReasonSeg:

ReasonSeg/
    train/
        image_1.jpg, image_1.json
        image_2.jpg, image_2.json
    val/
        image_1.jpg, image_1.json
        image_2.jpg, image_2.json

Expected dataset structure for R-VOS, and corresponding json:

rvos/
    DAVIS/
        train/
            JPEGImages
        valid/
            JPEGImages
            refdavis_valid.json
    YouTube/
        train/
            JPEGImages
            refyoutube_train.json
        valid/
            JPEGImages
            refyoutube_valid.json

Expected dataset structure for ReVOS:

ReVOS/
    JPEGImages
        <video1  >
        <video2  >
        <video...>
    mask_dict.json
    mask_dict_foreground.json 
    meta_expressions_train_.json
    meta_expressions_valid_.json

Dataset preparation for LLaVA-1.5 training data:

llava_dataset/
    gqa/
        images/
    ocr_vqa/
        images/
    textvqa/
        train_images/
    vg/
        VG_100K/
        VG_100K_2/
    llava_v1_5_mix665k.json