Graph200K Dataset

June 4, 2025 ยท View on GitHub

  1. Introduction
  2. Usage
  3. Preprocessing

Introduction

In natural language processing, tasks overlap significantly, facilitating strong cross-task learning ability. In contrast, visual tasks are inherently distinct, posing challenges for vision models to achieve similar generalization ability via instruction tuning. To ease this issue, we introduce a Graph Structured Multi-Task Dataset, named Graph200K.

Graph200K is built upon the Subjects200K dataset. Each image is annotated for five meta-tasks, including 1) conditional generation, 2) image restoration, 3) image editing, 4) IP preservation, and 5) style transfer. Using these tasks, we can also combine a wide range of complex tasks, as shown in the figure below.

xx

Usage

The dataset can be downloaded and used through the Datasets library, as follows:

import datasets
grapth200k = datasets.load_dataset("VisualCloze/Graph200K")

train = grapth200k['train']
test = grapth200k['test']

# Reading depth map (PIL.Image) of the first image in the train set 
train[0]['depth'].save(f'depth.jpg')

In each item of the dataset, there are annotations as follows. Examples can be found in huggingface.

ItemMeaning
refInherited from Subjects200K, it depicts the subject object in the target image.
targetThe original image inherited from Subjects200K.
InstantStyle_image_{0-3}Stylized images with invariant semantics.
InstantStyle_ref_{0-3}Style reference for InstantStyle.
ReduxStyle_image_{0-3}Stylized images with variant semantics.
ReduxStyle_ref_{0-3}Style reference for ReduxStyle.
FillEdit_image_{0-5}Edited image with invariant background.
FillEdit_metaThe name and descripation of the new subject object after editing.
DepthEditEdited image with variant background.
qwen_2_5_maskA high-quality segmentation mask generated by the Qwen-2.5-VL and SAM2.
qwen_2_5_bounding_boxThe bounding boxes generated by the Qwen-2.5-VL.
qwen_2_5_metaThe coordinate and object name of each bounding box. And the mask color corresponding ro each box.
sam2_maskA mask generated by the SAM2 model.
uniformerThe semantic segmentation generated by UniFormer.
foregroundThe foreground mask generated by RMBG-2.0.
normalSurface normal estimation generated by DSINE
depthThe depth estimation by Depth Anything V2.
cannyEdge detection in images, using the Canny edge detector.
hedEdge detection in images, using the HED detector.
mlsdLine segments generated using M-LSD.
openposeHuman keypoints generated by OpenPose

Preprocessing

To use the Graph200k for training and inference in our VisualCloze, we extract each image fileand generate a json file that record the path and meta information of each file.

python processing.py \
--target_path "where will the images and the json be saved" \
--split "train or test"