Graph200K Dataset
June 4, 2025 ยท View on GitHub
Introduction
In natural language processing, tasks overlap significantly, facilitating strong cross-task learning ability. In contrast, visual tasks are inherently distinct, posing challenges for vision models to achieve similar generalization ability via instruction tuning. To ease this issue, we introduce a Graph Structured Multi-Task Dataset, named Graph200K.
Graph200K is built upon the Subjects200K dataset. Each image is annotated for five meta-tasks, including 1) conditional generation, 2) image restoration, 3) image editing, 4) IP preservation, and 5) style transfer. Using these tasks, we can also combine a wide range of complex tasks, as shown in the figure below.

Usage
The dataset can be downloaded and used through the Datasets library, as follows:
import datasets
grapth200k = datasets.load_dataset("VisualCloze/Graph200K")
train = grapth200k['train']
test = grapth200k['test']
# Reading depth map (PIL.Image) of the first image in the train set
train[0]['depth'].save(f'depth.jpg')
In each item of the dataset, there are annotations as follows. Examples can be found in huggingface.
| Item | Meaning |
|---|---|
| ref | Inherited from Subjects200K, it depicts the subject object in the target image. |
| target | The original image inherited from Subjects200K. |
| InstantStyle_image_{0-3} | Stylized images with invariant semantics. |
| InstantStyle_ref_{0-3} | Style reference for InstantStyle. |
| ReduxStyle_image_{0-3} | Stylized images with variant semantics. |
| ReduxStyle_ref_{0-3} | Style reference for ReduxStyle. |
| FillEdit_image_{0-5} | Edited image with invariant background. |
| FillEdit_meta | The name and descripation of the new subject object after editing. |
| DepthEdit | Edited image with variant background. |
| qwen_2_5_mask | A high-quality segmentation mask generated by the Qwen-2.5-VL and SAM2. |
| qwen_2_5_bounding_box | The bounding boxes generated by the Qwen-2.5-VL. |
| qwen_2_5_meta | The coordinate and object name of each bounding box. And the mask color corresponding ro each box. |
| sam2_mask | A mask generated by the SAM2 model. |
| uniformer | The semantic segmentation generated by UniFormer. |
| foreground | The foreground mask generated by RMBG-2.0. |
| normal | Surface normal estimation generated by DSINE |
| depth | The depth estimation by Depth Anything V2. |
| canny | Edge detection in images, using the Canny edge detector. |
| hed | Edge detection in images, using the HED detector. |
| mlsd | Line segments generated using M-LSD. |
| openpose | Human keypoints generated by OpenPose |
Preprocessing
To use the Graph200k for training and inference in our VisualCloze, we extract each image fileand generate a json file that record the path and meta information of each file.
python processing.py \
--target_path "where will the images and the json be saved" \
--split "train or test"