Tiny MSCOCO2017 Dataset

May 1, 2024 ยท View on GitHub

Welcome to the Tiny COCO Dataset repository :blush:. This project aims to provide a simplified and fast-to-use version of the extensive COCO dataset for quick debugging and development of image processing models. The base version of this dataset contains exactly one image per category, making it lightweight and perfect for testing algorithms quickly.

About COCO

The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. COCO is widely used in the machine learning community for benchmarking state-of-the-art models.

About Tiny COCO

The Tiny COCO Dataset is a subset of the full COCO dataset and has been structured to provide immediate access to a smaller, more manageable collection of images across all categories. This is ideal for:

  • Rapid Prototyping: Quickly test and debug models without the overhead of working with tens of gigabytes of data.
  • Educational Purposes: Learn model building with a real-world dataset without significant hardware limitations.

Base Version

The base version of this dataset includes:

  • 1 image per category for both training (train) and validation (val) sets.
  • Annotations in COCO format that correspond to these images.

Data Customization

Users are encouraged to generate customized versions of the dataset with more images per category, depending on specific requirements. The repository includes Python scripts to facilitate this process.

Usage

:wink: Note that the Tiny COCO dataset is a subset of the original COCO dataset. To use the Tiny COCO dataset, you must first download the full COCO2017 dataset from the official website.

The downloaded dataset should be structured as follows:

train2017/
    000000000009.jpg
    000000000025.jpg
    ...
val2017/
    000000000139.jpg
    000000000285.jpg
    ...
annotations/
    instances_train2017.json
    instances_val2017.json
    ...

Clone this repository

git clone https://github.com/zzowenzz/COCO_Tiny.git
cd COCO_Tiny

Set up the environment

pip install -r requirements.txt

Run the scripts

python make_dataset.py \\
    --coco_path [path to the original MSCOCO2017 dataset] \\
    --mini_path data/COCO_Tiny [path to save the Tiny COCO dataset] \\
    --num_img [number of images per category] 

For example, to generate a Tiny COCO dataset with 1 image per category:

python make_dataset.py \\
    --coco_path [path to the original MSCOCO2017 dataset] \\
    --mini_path data/COCO_Tiny [path to save the Tiny COCO dataset] \\
    --num_img 1

Contributions

Contributions to this project are welcome! Please consider the following ways to contribute:

  • Improvements: Suggestions for improving the dataset or scripts.
  • Bug Reports: Identify and report issues in the dataset generation script.
  • Documentation: Enhancements to the README or additional guidelines.

License

No license for this dataset. Feel free to use it for any purpose. For more, please refer to the Terms of Use of the COCO dataset.