Embodied Family Code Base
April 26, 2024 ยท View on GitHub
We will update the instructions for this codebase as soon as possible.
Installation
See INSTALLATION.md
Data Preparation
- Download the EgoCOT dataset.
- Download the COCO-2017 dataset.
Download the Pretrained Model
Download the testing model Embodied_family_7btiny.
Prepare the Text Data Paired with Video and Image
- Unzip
datasets_share.zip, which contains the text part of the multi-modal dataset, to the./datasets/directory.
๐ Overview
๐ Major Features
Usage
This repo can be used in conjunction with PyTorch's Dataset and DataLoader for training models on heterogeneous
data. Here's a brief overview of the classes and their functionalities:
BaseDataset
The BaseDataset class extends PyTorch's Dataset and is designed to handle different media types (images, videos, and
text). It includes a transformation process to standardize the input data and a processor to handle the data specific to
the task.
Example
from robohusky.base_dataset_uni import BaseDataset
# Initialize the dataset with the required parameters
dataset = BaseDataset(
dataset, # Your dataset here
processor, # Your processor here
image_path="path/to/images",
input_size=224,
num_segments=8,
norm_type="openai",
media_type="image"
)
# Use the dataset with a PyTorch DataLoader
from torch.utils.data import DataLoader
data_loader = DataLoader(dataset, batch_size=32, shuffle=True)
WeightedConcatDataset
The WeightedConcatDataset class extends PyTorch's ConcatDataset and allows for the creation of a unified dataset by
concatenating multiple datasets with specified weights.
Example
from robohusky.base_dataset_uni import WeightedConcatDataset
# Assume we have multiple datasets for different tasks
dataset1 = BaseDataset(...)
dataset2 = BaseDataset(...)
dataset3 = BaseDataset(...)
# Define the weights for each dataset
weights = [0.5, 0.3, 0.2]
# Create a weighted concatenated dataset
weighted_dataset = WeightedConcatDataset([dataset1, dataset2, dataset3], weights=weights)
# Use the weighted dataset with a PyTorch DataLoader
data_loader = DataLoader(weighted_dataset, batch_size=32, shuffle=True)
Customization
The package is designed to be flexible and customizable. You can implement your own transformation and processing logic
by subclassing BaseDataset and overriding the necessary methods.
๐ซ License
This project is released under the Apache 2.0 license.
๐๏ธ Citation
If you find this project useful in your research, please consider cite:
@article{mu2024embodiedgpt,
title={Embodiedgpt: Vision-language pre-training via embodied chain of thought},
author={Mu, Yao and Zhang, Qinglong and Hu, Mengkang and Wang, Wenhai and Ding, Mingyu and Jin, Jun and Wang, Bin and Dai, Jifeng and Qiao, Yu and Luo, Ping},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}