Data Format
February 20, 2023 · View on GitHub
Data Format
Contents
The CholecT50 dataset folder includes:
- videos/: 50 cholecystectomy videos
- labels/: triplet annotations on 50 videos
- a label mapping text file
- a LICENCE file
- a README file
Expand to view the dataset directory structure:
──CholecT50
├───videos
│ ├───VID01
│ │ ├───000000.png
│ │ ├───000001.png
│ │ ├───000002.png
│ │ ├───
│ │ └───N.png
│ ├───VID02
│ │ ├───000000.png
│ │ ├───000001.png
│ │ ├───000002.png
│ │ ├───
│ │ └───N.png
│ ├───
│ └───VIDN
│ ├───000000.png
│ ├───000001.png
│ ├───000002.png
│ ├───
│ └───N.png
|
├───labels
│ ├───VID01.json
│ ├───VID02.json
│ ├───VID03.json
│ ├───
│ └───VIDNN.json
|
├───label_mapping.txt
├───LICENSE
└───README.md
Videos and Images
This contains the surgical videos. Each video directory contains unequal N image frames extracted sequentially from the video at 1 FPS. The pixel resolution is 854x480x3.
The filenames are sequentially numbered as imageID.png.
The filenames are the imageIDs padded with leading zeros upto 6 digits, hence, 000001.png==1, 000023.png==23, 021563.png==21563, etc.
To ensure anonymity, frames corresponding to out-of-body views are entirely blacked (RGB 0 0 0) out.
Labels
The labels directory contains a JSON file per video which contains labels for triplets, instruments, verbs, targets, and phase per frame.
Expand to view the structure of the JSON file:
──VIDX.json
├───video: (int) - video ID.
├───fps: (int) - frame rate (usually 1).
├───num_frames: (int) - number of sampled frames.
├───categories: list[] - containing category dictionary per task.
│ ├───triplet: dict() - ID to triplet categories.
│ ├───instrument: dict() - ID to instrument categories
│ ├───verb: dict() - ID to verb categories.
│ ├───target: dict() - ID to target categories.
│ └───phase: dict() - ID to phase categories.
|
├───annotations: dict() mapping frame ID to frame annotations:
│ ├───0: list[] - triplet instance variables in frame 0.
│ ├───1: list[] - triplet instance variables in frame 1.
│ ├───2: list[] - triplet instance variables in frame 2.
│ ├─── . . .
│ └───N: list[] - triplet instance variables in frame N.
|
├───info: text - dataset description: name, date, version, bbox format, copyright, etc.
└───licenses: text - license info including the ID, name and url.
Annotation format
There can be zero, one, or multiple triplet instances per frame. Here, a frame ID is mapped to a list of all triplet instances in the frame. Each triplet instance is a vector of 15 items describing the triplet, instrument, verb, target, and phase informaton for that instance example. The vector values can vary across triplet instances, however, the phase label is unchanged within a frame.
Keys:
- ID: (int) category identity
- SC: (float) confidence score :
1for groundtruths, probability[0...1]for predictions - BX: (float) bounding box x1 cordinate (left), scaled by the image width
- BY: (float) bounding box y1 cordinate (top), scaled by the image height
- BW: (float) bounding box width, scaled by the image width
- BH: (float) bounding box height, scaled by the image height
Value is
-1for null or absence.
Label Mapping
The label_mapping.txt file contains a table, consisting of 6 columns for mapping triplet IDs to their component IDs.
This is useful for decomposing a triplet to its constituting components.
The first column indicates the triplet ID (that is instrument-verb-target paring IDs).
The second column indicates the instrument ID.
The third column indicates the verb IDs.
The fourth column indicates the target IDs.
The fifth column indicates the instrument-verb pairing IDs.
The sixth column indicates the instrument-target pairing IDs.
Example usage:
The first row in the maps.txt shows:
1,0,2,0,2,0
This means that triplet iD 1 can be mapped to <0, 2, 0> which is {grasper, dissect, gallbladder}.