Data Format

February 20, 2023 · View on GitHub

Data Format

The CholecT50 dataset folder includes:

videos/: 50 cholecystectomy videos
labels/: triplet annotations on 50 videos
a label mapping text file
a LICENCE file
a README file

Expand to view the dataset directory structure:

  ──CholecT50
      ├───videos
      │   ├───VID01
      │   │   ├───000000.png
      │   │   ├───000001.png
      │   │   ├───000002.png
      │   │   ├───
      │   │   └───N.png
      │   ├───VID02
      │   │   ├───000000.png
      │   │   ├───000001.png
      │   │   ├───000002.png
      │   │   ├───
      │   │   └───N.png
      │   ├───
      │   └───VIDN
      │       ├───000000.png
      │       ├───000001.png
      │       ├───000002.png
      │       ├───
      │       └───N.png
      |
      ├───labels
      │   ├───VID01.json
      │   ├───VID02.json
      │   ├───VID03.json
      │   ├───
      │   └───VIDNN.json
      |
      ├───label_mapping.txt        
      ├───LICENSE
      └───README.md

Videos and Images

This contains the surgical videos. Each video directory contains unequal N image frames extracted sequentially from the video at 1 FPS. The pixel resolution is 854x480x3. The filenames are sequentially numbered as imageID.png. The filenames are the imageIDs padded with leading zeros upto 6 digits, hence, 000001.png==1, 000023.png==23, 021563.png==21563, etc.

To ensure anonymity, frames corresponding to out-of-body views are entirely blacked (RGB 0 0 0) out.

Labels

The labels directory contains a JSON file per video which contains labels for triplets, instruments, verbs, targets, and phase per frame.

Expand to view the structure of the JSON file:

      ──VIDX.json
          ├───video: (int) - video ID.
          ├───fps: (int) - frame rate (usually 1).
          ├───num_frames: (int) - number of sampled frames.
          ├───categories: list[] - containing category dictionary per task.
          │   ├───triplet: dict() - ID to triplet categories.
          │   ├───instrument: dict() - ID to instrument categories
          │   ├───verb: dict() - ID to verb categories.
          │   ├───target: dict() - ID to target categories.
          │   └───phase: dict() - ID to phase categories.
          |
          ├───annotations: dict()  mapping frame ID to frame annotations:
          │   ├───0: list[]  - triplet instance variables in frame 0.
          │   ├───1: list[]  - triplet instance variables in frame 1.
          │   ├───2: list[]  - triplet instance variables in frame 2.
          │   ├─── . . .
          │   └───N: list[]  - triplet instance variables in frame N.
          |
          ├───info: text -  dataset description: name, date, version, bbox format, copyright, etc.
          └───licenses: text - license info including the ID, name and url.

Annotation format

There can be zero, one, or multiple triplet instances per frame. Here, a frame ID is mapped to a list of all triplet instances in the frame. Each triplet instance is a vector of 15 items describing the triplet, instrument, verb, target, and phase informaton for that instance example. The vector values can vary across triplet instances, however, the phase label is unchanged within a frame.

Keys:

ID: (int) category identity
SC: (float) confidence score : 1 for groundtruths, probability [0...1] for predictions
BX: (float) bounding box x1 cordinate (left), scaled by the image width
BY: (float) bounding box y1 cordinate (top), scaled by the image height
BW: (float) bounding box width, scaled by the image width
BH: (float) bounding box height, scaled by the image height Value is -1 for null or absence.

Label Mapping

The label_mapping.txt file contains a table, consisting of 6 columns for mapping triplet IDs to their component IDs. This is useful for decomposing a triplet to its constituting components. The first column indicates the triplet ID (that is instrument-verb-target paring IDs). The second column indicates the instrument ID. The third column indicates the verb IDs. The fourth column indicates the target IDs. The fifth column indicates the instrument-verb pairing IDs. The sixth column indicates the instrument-target pairing IDs.

Example usage:

The first row in the maps.txt shows: 1,0,2,0,2,0 This means that triplet iD 1 can be mapped to <0, 2, 0> which is {grasper, dissect, gallbladder}.

Data Format

Contents

Videos and Images

Labels

Annotation format

Label Mapping

Example usage: