Model_cards.md

April 17, 2024 ยท View on GitHub

ModelsMLLM ArchitectureGitHub StarsHuggingface Download
LLaVA-v1.5-13BPretrained Vision Encoder + Projector + LLM15.4K333.7K
LVIS-Instruct4v-LLaVA-7BPretrained Vision Encoder + Projector + LLM1225
MiniGPT-v2Pretrained Vision Encoder + Projector + LLM24.7K/
LLaVA-v1.5-7BPretrained Vision Encoder + Projector + LLM15.4K703K
LLaVA-v1.6-Vicuna-7BPretrained Vision Encoder + Projector + LLM15.4K1.2M
LLaVA-v1.6-Vicuna-13BPretrained Vision Encoder + Projector + LLM15.4K100.1K
LLaVA-v1.6-34BPretrained Vision Encoder + Projector + LLM15.4K592.8K
Yi-VL-6BPretrained Vision Encoder + Projector + LLM7K17.2K
ALLaVAPretrained Vision Encoder + Projector + LLM13493
kosmos2Pretrained Vision Encoder + Grounded LLM18.1K29.2K
LWMPretrained Vision Encoder + Projector + Long-Context LLM6.6K/
BLIP2-Flan-T5-XLQuery tokens + LM8.5K35.4K
Qwen-Vl-ChatQuery tokens + LLM3.4K289.9K
InstructBLIP-Vicuna-13BQuery tokens + LLM8.5K5.4K
mPLUG-Owl2Query tokens + LLM with Modality-Adaptive Module1.9K9.7K
CheetorQuery tokens + VPG-C + LLM308/
Fuyu-8BLinear Vision Encoder + LLM/17.9K
SEED-LLaMAVQ-based Vision Encoder + LLM445/
OpenFlamingoPerceiver Resampler + LLM with Gated Cross-Attention Layers3.4K/