Graph200K Dataset

June 4, 2025 · View on GitHub

Introduction
Usage
Preprocessing

Introduction

In natural language processing, tasks overlap significantly, facilitating strong cross-task learning ability. In contrast, visual tasks are inherently distinct, posing challenges for vision models to achieve similar generalization ability via instruction tuning. To ease this issue, we introduce a Graph Structured Multi-Task Dataset, named Graph200K.

Graph200K is built upon the Subjects200K dataset. Each image is annotated for five meta-tasks, including 1) conditional generation, 2) image restoration, 3) image editing, 4) IP preservation, and 5) style transfer. Using these tasks, we can also combine a wide range of complex tasks, as shown in the figure below.

Usage

The dataset can be downloaded and used through the Datasets library, as follows:

import datasets
grapth200k = datasets.load_dataset("VisualCloze/Graph200K")

train = grapth200k['train']
test = grapth200k['test']

# Reading depth map (PIL.Image) of the first image in the train set 
train[0]['depth'].save(f'depth.jpg')

In each item of the dataset, there are annotations as follows. Examples can be found in huggingface.

Item	Meaning
ref	Inherited from Subjects200K, it depicts the subject object in the target image.
target	The original image inherited from Subjects200K.
InstantStyle_image_{0-3}	Stylized images with invariant semantics.
InstantStyle_ref_{0-3}	Style reference for InstantStyle.
ReduxStyle_image_{0-3}	Stylized images with variant semantics.
ReduxStyle_ref_{0-3}	Style reference for ReduxStyle.
FillEdit_image_{0-5}	Edited image with invariant background.
FillEdit_meta	The name and descripation of the new subject object after editing.
DepthEdit	Edited image with variant background.
qwen_2_5_mask	A high-quality segmentation mask generated by the Qwen-2.5-VL and SAM2.
qwen_2_5_bounding_box	The bounding boxes generated by the Qwen-2.5-VL.
qwen_2_5_meta	The coordinate and object name of each bounding box. And the mask color corresponding ro each box.
sam2_mask	A mask generated by the SAM2 model.
uniformer	The semantic segmentation generated by UniFormer.
foreground	The foreground mask generated by RMBG-2.0.
normal	Surface normal estimation generated by DSINE
depth	The depth estimation by Depth Anything V2.
canny	Edge detection in images, using the Canny edge detector.
hed	Edge detection in images, using the HED detector.
mlsd	Line segments generated using M-LSD.
openpose	Human keypoints generated by OpenPose

Preprocessing

To use the Graph200k for training and inference in our VisualCloze, we extract each image fileand generate a json file that record the path and meta information of each file.

python processing.py \
--target_path "where will the images and the json be saved" \
--split "train or test"