README.md

May 7, 2024 · View on GitHub

ChatSpot

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Liang Zhao*, En Yu*, Zheng Ge, Jinrong Yang, Haoran Wei, Hongyu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chun Han, and Xiangyu Zhang

ChatSpot is an innovative multimodal large language model (MLLM) developed to enhance human-AI interactivity by supporting precise referring instructions. This model aims to overcome the limitations of traditional end-to-end MLLMs, which only allow interaction through language instructions, thus restricting interactive accuracy and efficiency. ChatSpot introduces a new level of interaction by utilizing various reference representations, such as points and bounding boxes, to focus on specific regions of interest within images or scenes.

Dataset, code, and demo will be released soon.

Datasets
Code

@article{zhao2023chatspot,
  title={Chatspot: Bootstrapping multimodal llms via precise referring instruction tuning},
  author={Zhao, Liang and Yu, En and Ge, Zheng and Yang, Jinrong and Wei, Haoran and Zhou, Hongyu and Sun, Jianjian and Peng, Yuang and Dong, Runpei and Han, Chunrui and others},
  journal={arXiv preprint arXiv:2307.09474},
  year={2023}
}

README.md

ChatSpot

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Contents

Datasets

Code

Contact

License

Citation