The Large Language Models Survey repository is a comprehensive compendium dedicated to the exploration and understanding of Large Language Models (LLMs). It houses an assortment of resources including research papers, blog posts, tutorials, code examples, and more to provide an in-depth look at the progression, methodologies, and applications of LLMs. This repo is an invaluable resource for AI researchers, data scientists, or enthusiasts interested in the advancements and inner workings of LLMs. We encourage contributions from the wider community to promote collaborative learning and continue pushing the boundaries of LLM research.

| Language Model | Organization | Release Date | Checkpoints | Paper/Blog | Params (B) | Context Length | Licence | Try it |
|---|
| 2025 Latest Models | | | | | | | | |
| Grok 3 / Grok 3 Mini | xAI | 2025/02 | Grok 3, Grok 3 Mini | Grok 3 Beta โ The Age of Reasoning Agents | 314 active (1M+ total) / Smaller variant | 1M tokens | Proprietary | xAI Platform |
| Llama 4 Scout | Meta | 2025/04 | Llama 4 Scout | The Llama 4 herd: The beginning of a new era | 17B active (109B total) | 10M tokens | Llama 4 Community License | HuggingFace |
| Llama 4 Maverick | Meta | 2025/04 | Llama 4 Maverick | The Llama 4 herd | 17B active (400B total) | 1M tokens | Llama 4 Community License | HuggingFace |
| Llama 4 Behemoth | Meta | 2025/04 (Training) | In Training | The Llama 4 herd | 288B active (~2T total) | TBD | TBD | TBD |
| Qwen 3 Family | Alibaba | 2025/04 | Qwen 3 Family | Alibaba unveils Qwen3 | 0.6B - 235B (22B active) | 32K - 131K tokens | Apache 2.0 | Qwen Chat |
| DeepSeek-R1 Family | DeepSeek | 2025/01-05 | DeepSeek-R1, R1-Zero, R1-0528 | DeepSeek-R1: Incentivizing Reasoning Capability | 37B active (671B total) | 128K tokens | MIT | DeepSeek Platform |
| o3 / o3-mini / o4-mini | OpenAI | 2025/01-04 | o3, o3-mini, o4-mini | Introducing OpenAI o3 and o4-mini | Undisclosed | 200K tokens | Proprietary | ChatGPT |
| Claude 4 (Sonnet & Opus) | Anthropic | 2025/05 | Claude Sonnet 4, Claude Opus 4 | Introducing Claude 4 | Undisclosed | 200K tokens | Proprietary | Claude.ai |
| Gemini 2.5 Family | Google | 2025/03-06 | Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash-Lite | Gemini 2.5: Our newest Gemini model with thinking | Undisclosed | 1M tokens | Proprietary | Gemini |
| Major 2024 Models | | | | | | | | |
| GPT-4o / GPT-4o mini | OpenAI | 2024/05-07 | GPT-4o, GPT-4o mini | Hello GPT-4o, GPT-4o mini: advancing cost-efficient intelligence | Undisclosed | 128K tokens | Proprietary | ChatGPT |
| o1 / o1-mini | OpenAI | 2024/09 | o1, o1-mini | Learning to Reason with LLMs | Undisclosed | 200K / 128K tokens | Proprietary | ChatGPT |
| Claude 3 Family | Anthropic | 2024/03 | Claude 3 Haiku, Claude 3 Sonnet, Claude 3 Opus | Introducing the next generation of Claude | Undisclosed | 200K tokens | Proprietary | Claude.ai |
| Claude 3.5 Sonnet | Anthropic | 2024/06 | Claude 3.5 Sonnet | Claude 3.5 Sonnet | Undisclosed | 200K tokens | Proprietary | Claude.ai |
| Claude 3.7 Sonnet | Anthropic | 2024/10 | Claude 3.7 Sonnet | Claude 3.7 Sonnet | Undisclosed | 200K tokens | Proprietary | Claude.ai |
| Gemini 1.5 Pro / Flash | Google | 2024/02-05 | Gemini 1.5 Pro, Gemini 1.5 Flash | Our next-generation model: Gemini 1.5 | Undisclosed | 1M-2M / 1M tokens | Proprietary | Gemini |
| Gemini 2.0 Flash | Google | 2024/12 | Gemini 2.0 Flash | Gemini 2.0 Flash | Undisclosed | 1M tokens | Proprietary | Gemini |
| Gemma 2 | Google | 2024/06 | Gemma 2 Family | Gemma 2: Improving Open Language Models at a Practical Size | 9B, 27B | 8K tokens | Apache 2.0 | HuggingFace |
| Llama 3 Family | Meta | 2024/04 | Llama 3 Weights | Introducing Meta Llama 3 | 8B, 70B | 8K tokens | Custom | HuggingChat |
| Llama 3.1 | Meta | 2024/07 | Llama 3.1 Weights | The Llama 3 Herd of Models | 8B, 70B, 405B | 128K tokens | Custom | HuggingChat |
| Llama 3.2 | Meta | 2024/09 | Llama 3.2 Models | Llama 3.2: Revolutionizing edge AI and vision with open, customizable models | 1B, 3B, 11B, 90B | 128K tokens | Custom | HuggingChat |
| Llama 3.3 | Meta | 2024/12 | Llama 3.3 70B | Llama 3.3 70B | 70B | 128K tokens | Custom | HuggingChat |
| Phi-3 Family | Microsoft | 2024/04-08 | Phi-3 Mini, Phi-3 Small, Phi-3 Medium, Phi-3.5 Mini | Phi-3 Technical Report | 3.8B - 14B | 4K-128K tokens | MIT | Azure AI Studio |
| IBM Granite 3.0 / 3.1 | IBM | 2024/10-12 | Granite 3.0, Granite 3.1 | IBM Introduces Granite 3.0 | 2B, 8B | 4K / 128K tokens | Apache 2.0 | IBM watsonx |
| Command R / R+ | Cohere | 2024/03-04 | Command R, Command R+ | Command R: Cohere's scalable generative model | 35B / 104B | 128K tokens | CC BY-NC 4.0 | Cohere Platform |
| DeepSeek-V3 Family | DeepSeek | 2024/12-2025/03 | DeepSeek-V3, DeepSeek-V3-0324 | DeepSeek-V3 Technical Report | 37B active (671B total) | 128K tokens | MIT | DeepSeek Platform |
| Qwen 2.5 Family | Alibaba | 2024/09-2025/01 | Qwen 2.5 Family, Qwen 2.5-Max | Qwen2.5: A Party of Foundation Models | 0.5B - 72B / Undisclosed | 32K-128K tokens | Apache 2.0 / Proprietary | Qwen Chat |
| QwQ-32B | Alibaba | 2024/11 | QwQ-32B-Preview | QwQ-32B Technical Report | 32B | 32K tokens | Apache 2.0 | Qwen Chat |
| Mistral Family | Mistral AI | 2023/09-2025/05 | Mistral-7B, Mistral Large 2, Mistral Medium | Mistral 7B | 7B - 123B / Undisclosed | 4K-128K tokens | Apache 2.0 / Proprietary | Mistral Platform |
| Command R / R+ | 2024/03-04 | Command R, Command R+ | Command R: Cohere's scalable generative model | 35B / 104B | 128K tokens | CC BY-NC 4.0 | Cohere Platform | |
| DeepSeek-V3 Family | 2024/12-2025/03 | DeepSeek-V3, DeepSeek-V3-0324 | DeepSeek-V3 Technical Report | 37B active (671B total) | 128K tokens | MIT | DeepSeek Platform | |
| Qwen 2.5 Family | 2024/09-2025/01 | Qwen 2.5 Family, Qwen 2.5-Max | Qwen2.5: A Party of Foundation Models | 0.5B - 72B / Undisclosed | 32K-128K tokens | Apache 2.0 / Proprietary | Qwen Chat | |
| QwQ-32B | 2024/11 | QwQ-32B-Preview | QwQ-32B Technical Report | 32B | 32K tokens | Apache 2.0 | Qwen Chat | |
| Mistral Family | 2023/09-2025/05 | Mistral-7B, Mistral Large 2, Mistral Medium | Mistral 7B | 7B - 123B / Undisclosed | 4K-128K tokens | Apache 2.0 / Proprietary | Mistral Platform | |
| Previous Generation Models | | | | | | | | |
| GPT-4 / GPT-4.5 | 2023/03-2024/06 | API Access | GPT-4 Technical Report | Undisclosed | 8K-128K tokens | Proprietary | ChatGPT | |
| LLaMA 2 | 2023/06 | LLaMA 2 Weights | Llama 2: Open Foundation and Fine-Tuned Chat Models | 7B - 70B | 4K tokens | Custom | HuggingChat | |
| PaLM 2 | 2023/05 | PaLM 2 | PaLM 2 Technical Report | Undisclosed | 8K tokens | Proprietary | Bard | |
| Bard | 2023/03 | Bard | Bard: An experiment by Google | Undisclosed | 8K tokens | Proprietary | Bard | |
| Chinchilla | 2022/03 | Chinchilla | Training Compute-Optimal Large Language Models | 70B | 2K tokens | Proprietary | [Research Only] | |
| Sparrow | 2022/09 | Sparrow | Improving alignment of dialogue agents via targeted human judgements | 70B | 4K tokens | Proprietary | [Research Only] | |
| Gopher | 2021/12 | Gopher | Scaling Language Models: Methods, Analysis & Insights from Training Gopher | 280B | 2K tokens | Proprietary | [Research Only] | |
| YaLM | 2022/06 | YaLM 100B | YaLM 100B | 100B | 2K tokens | Apache 2.0 | GitHub | |
| OPT | 2022/05 | OPT Family | OPT: Open Pre-trained Transformer Language Models | 0.125B - 175B | 2K tokens | MIT | HuggingFace | |
| BLOOM | 2022/11 | BLOOM | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | 176B | 2K tokens | OpenRAIL-M v1 | HuggingFace | |
| Jurassic-1 / Jurassic-2 | 2021/08 / 2023/03 | AI21 Studio | Jurassic-1: Technical Details And Evaluation | 178B | 2K / 8K tokens | Proprietary | AI21 Studio | |
| Anthropic LM (v4-s3) | 2022/12 | Anthropic LM | Constitutional AI: Harmlessness from AI Feedback | 52B | 4K tokens | Proprietary | [Research Only] | |
| GLaM | 2021/12 | GLaM | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | 1.2T (64B active) | 2K tokens | Proprietary | [Research Only] | |
| GPT-J / GPT-NeoX | 2021/06 / 2022/04 | GPT-J-6B, GPT-NeoX-20B | GPT-J-6B: 6B JAX-Based Transformer | 6B / 20B | 2K tokens | Apache 2.0 | HuggingFace | |
| Minerva | 2022/06 | Minerva | Solving Quantitative Reasoning Problems with Language Models | 540B | 2K tokens | Proprietary | [Research Only] | |
| Gallactica | 2022/11 | Gallactica | Gallactica: A Large Language Model for Science | 120B | 2K tokens | Apache 2.0 | [Removed] | |
| Vicuna | 2023/03 | Vicuna | Vicuna: An Open-Source Chatbot Impressing GPT-4 | 7B, 13B, 33B | 2K tokens | Custom | FastChat | |
| Alpaca | 2023/03 | Stanford Alpaca | Stanford Alpaca: An Instruction-following LLaMA Model | 7B | 2K tokens | Custom | GitHub | |
| Coding-Specialized Models | | | | | | | | |
| Code Llama | 2023/08 | Code Llama Models | Code Llama: Open Foundation Models for Code | 7B - 34B | 4K tokens | Custom | HuggingChat | |
| StarCoder / StarChat | 2023/05 | StarCoder, StarChat | StarCoder: A State-of-the-Art LLM for Code | 1.1B - 16B | 8K tokens | OpenRAIL-M v1 | HuggingFace | |
| CodeGen2 / CodeGen2.5 | 2023/04-07 | CodeGen2, CodeGen2.5 | CodeGen2: Lessons for Training LLMs on Programming and Natural Languages | 1B - 16B | 2K tokens | Apache 2.0 | HuggingFace | |
| CodeT5+ | 2023/05 | CodeT5+ | CodeT5+: Open Code Large Language Models for Code Understanding and Generation | 0.22B - 16B | 512 tokens | BSD-3-Clause | GitHub | |
| Replit Code | 2023/05 | replit-code-v1-3b | Training a SOTA Code LLM in 1 week | 2.7B | Infinity (ALiBi) | CC BY-SA-4.0 | HuggingFace | |
| SantaCoder | 2023/01 | SantaCoder | SantaCoder: don't reach for the stars! | 1.1B | 2K tokens | OpenRAIL-M v1 | HuggingFace | |
| DeciCoder | 2023/08 | DeciCoder-1B | Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation | 1.1B | 2K tokens | Apache 2.0 | HuggingFace | |
| Additional Historical Models | | | | | | | | |
| T5 / Flan-T5 | 2019/10 | T5 & Flan-T5 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 0.06B - 11B | 512 tokens | Apache 2.0 | HuggingFace | |
| UL2 / Flan-UL2 | 2022/10 | UL2 & Flan-UL2 | UL2 20B: An Open Source Unified Language Learner | 20B | 512-2K tokens | Apache 2.0 | HuggingFace | |
| InstructGPT | 2022/03 | API Access | Training language models to follow instructions with human feedback | 1.3B - 175B | 2K tokens | Proprietary | [OpenAI API] | |
| ChatGPT | 2022/11 | API Access | ChatGPT: Optimizing Language Models for Dialogue | ~175B | 4K tokens | Proprietary | ChatGPT | |
| Pythia | 2023/04 | Pythia 70M - 12B | Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | 0.07B - 12B | 2K tokens | Apache 2.0 | HuggingFace | |
| Dolly | 2023/04 | dolly-v2-12b | Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM | 3B, 7B, 12B | 2K tokens | MIT | HuggingFace | |
| RedPajama-INCITE | 2023/05 | RedPajama-INCITE | Releasing 3B and 7B RedPajama-INCITE family of models | 3B - 7B | 2K tokens | Apache 2.0 | HuggingFace | |
| Falcon | 2023/05 | Falcon-180B, Falcon-40B, Falcon-7B | The RefinedWeb Dataset for Falcon LLM | 7B, 40B, 180B | 2K tokens | Apache 2.0 | HuggingFace | |
| MPT Family | 2023/05-06 | MPT-7B, MPT-30B | Introducing MPT-7B | 7B, 30B | 2K-8K tokens | Apache 2.0 | MosaicML | |
| OpenLLaMA | 2023/05 | OpenLLaMA Models | OpenLLaMA: An Open Reproduction of LLaMA | 3B, 7B, 13B | 2K tokens | Apache 2.0 | HuggingFace | |
| h2oGPT | 2023/05 | h2oGPT | Building the World's Best Open-Source Large Language Model | 12B - 20B | 256-2K tokens | Apache 2.0 | h2oGPT | |
| FastChat-T5 | 2023/04 | fastchat-t5-3b-v1.0 | FastChat-T5: Compact and Commercial-friendly Chatbot | 3B | 512 tokens | Apache 2.0 | HuggingFace | |
| StableLM | 2023/04 | StableLM-Alpha | Stability AI Launches StableLM Suite | 3B - 65B | 4K tokens | CC BY-SA-4.0 | HuggingFace | |
| Koala | 2023/04 | Koala | Koala: A Dialogue Model for Academic Research | 13B | 4K tokens | Custom | BAIR | |
| OpenHermes | 2023/09 | OpenHermes-7B, OpenHermes-13B | Nous Research OpenHermes | 7B, 13B | 4K tokens | MIT | HuggingFace | |
| SOLAR | 2023/12 | Solar-10.7B | SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-scaling | 10.7B | 4K tokens | Apache 2.0 | HuggingFace | |
| Phi-2 | 2023/12 | phi-2 | Phi-2: The surprising power of small language models | 2.7B | 2K tokens | MIT | HuggingFace | |
| OpenLM | 2023/09 | OpenLM 1B, OpenLM 7B | Open LM: a minimal but performative language modeling repository | 1B, 7B | 2K tokens | MIT | HuggingFace | |
| RWKV | 2021/08 | RWKV Models | The RWKV Language Model | 0.1B - 14B | Infinite (RNN) | Apache 2.0 | HuggingFace | |
| DLite | 2023/05 | dlite-v2-1_5b | Announcing DLite V2: Lightweight, Open LLMs | 0.124B - 1.5B | 1K tokens | Apache 2.0 | HuggingFace | |
| Open Assistant | 2023/03 | OA-Pythia-12B | Democratizing Large Language Model Alignment | 12B | 2K tokens | Apache 2.0 | HuggingFace | |
| Cerebras-GPT | 2023/03 | Cerebras-GPT | Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models | 0.111B - 13B | 2K tokens | Apache 2.0 | HuggingFace | |
| XGen | 2023/06 | XGen-7B-8K-Base | Long Sequence Modeling with XGen | 7B | 8K tokens | Apache 2.0 | HuggingFace | |
The year 2024 was transformative for the LLM landscape, with multiple breakthrough releases that established new benchmarks and capabilities:
OpenAI's Major Releases: GPT-4o launched in May 2024 brought true multimodal capabilities with 232ms response times, while o1 and o1-mini in September introduced reasoning models that spend more time "thinking" through problems, achieving 83% on mathematical olympiad problems compared to GPT-4o's 13%.
Anthropic's Claude 3 Family: The Claude 3 series (Haiku, Sonnet, Opus) launched in March 2024 were the first models to challenge GPT-4's dominance on leaderboards, followed by Claude 3.5 Sonnet in June and Claude 3.7 Sonnet in October, which became particularly popular for coding tasks.
Google's Gemini Evolution: Gemini 1.5 Pro debuted in February 2024 with up to 2M token context windows, followed by Gemini 1.5 Flash in May for faster performance, and Gemini 2.0 Flash in December 2024.
Meta's Llama Progression: Llama 3 (8B, 70B) launched in April 2024, followed by the groundbreaking Llama 3.1 series in July including the massive 405B parameter model - the largest open-source model at the time. Llama 3.2 brought multimodal capabilities in September, and Llama 3.3 concluded the year in December.
Microsoft's Phi Revolution: Microsoft's Phi-3 family proved that smaller models could punch above their weight, with Phi-3 Mini (3.8B parameters) matching much larger models on benchmarks. The series expanded with Phi-3 Small (7B), Phi-3 Medium (14B), and Phi-3.5 Mini throughout 2024.
Enterprise-Focused Models: IBM Granite 3.0 launched in October 2024 focused on enterprise use cases, while Cohere's Command R and Command R+ models excelled in retrieval-augmented generation tasks.
Google's Open Models: Gemma 2 (9B, 27B parameters) launched in June 2024 became highly popular in the open-source community, consistently ranking high in community evaluations.
The year 2025 has been marked by several breakthrough releases in the LLM landscape. Grok 3, launched by xAI in February 2025, introduced a 1 million token context window and achieved a record-breaking Elo score of 1402 in the Chatbot Arena, making it the first AI model to surpass this milestone. The model was trained on 12.8 trillion tokens and boasts 10x the computational power of its predecessor.
Meta's Llama 4 family represents a major leap forward with the introduction of Mixture-of-Experts (MoE) architecture. Llama 4 Scout features an unprecedented 10 million token context window, while Llama 4 Maverick achieves an ELO score of 1417 on LMSYS Chatbot Arena, outperforming GPT-4o and Gemini 2.0 Flash.
DeepSeek-R1 emerged as the first major open-source reasoning model, trained purely through reinforcement learning without supervised fine-tuning. The model demonstrates performance comparable to OpenAI's o1 across math, code, and reasoning tasks while being completely open-source under the MIT license.
Cursor-AI emerged as a vibe coding platform.
Qwen 3, released by Alibaba in April 2025, features a family of "hybrid" reasoning models ranging from 0.6B to 235B parameters, supporting 119 languages and trained on over 36 trillion tokens. The models seamlessly integrate thinking and non-thinking modes, offering users flexibility to control the thinking budget.
OpenAI continued its reasoning model series with o3 and o4-mini in April 2025, while Anthropic launched Claude 4 (Opus 4 and Sonnet 4) in May 2025, setting new standards for coding and advanced reasoning with extended thinking capabilities and tool use.
Google's Gemini 2.5 Pro debuted as a thinking model with a 1 million token context window, leading on LMArena leaderboards and excelling in coding, math, and multimodal understanding tasks.
-
Reasoning Models: The emergence of models that can "think" through problems step-by-step, with extended reasoning capabilities becoming standard.
-
Massive Context Windows: Models now support context windows ranging from 1M to 10M tokens, enabling processing of entire codebases and documents.
-
Mixture-of-Experts (MoE) Architecture: More efficient model architectures that activate only a subset of parameters during inference.
-
Open-Source Reasoning: DeepSeek-R1's success has democratized access to reasoning capabilities previously available only in proprietary models.
-
Multimodal Integration: Native multimodality becoming standard, with models trained on text, images, audio, and video from the ground up.
-
Tool Use and Agentic Capabilities: Enhanced ability to use tools, execute code, and perform complex multi-step tasks autonomously.
- Grok 3: 93.3%
- DeepSeek-R1-0528: 87.5%
- Gemini 2.5 Pro: 86.7%
- o3-mini: 86.5%
- Claude Opus 4: 72.5%
- Claude Sonnet 4: 72.7%
- OpenAI Codex 1: 72.1%
- Llama 4 Maverick: ~70%
- Llama 4 Scout: 10M tokens
- Grok 3: 1M tokens
- Gemini 2.5 Pro: 1M tokens
- Llama 4 Maverick: 1M tokens
- ChatGPT revolutionized conversational AI
- InstructGPT introduced instruction following
- Large proprietary models dominated (GPT-3, PaLM, Chinchilla)
- LLaMA sparked the open-source revolution
- Claude introduced constitutional AI
- Specialized coding models emerged (Code Llama, StarCoder)
- Model sizes optimized for efficiency (Phi, Mistral)
- GPT-4o achieved true multimodality
- o1 introduced step-by-step reasoning
- Claude 3 challenged GPT-4 dominance
- Llama 3.1 405B became largest open model
- Gemini 1.5 pushed context limits to 2M tokens
- Grok 3 achieved highest Arena scores
- DeepSeek-R1 democratized reasoning capabilities
- Llama 4 introduced 10M token contexts
- Claude 4 set new coding standards
- Qwen 3 pioneered hybrid reasoning modes
If you find our survey useful for your research, please cite the following paper:
@article{hadi2024large,
title={Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects},
author={Hadi, Muhammad Usman and Al Tashi, Qasem and Shah, Abbas and Qureshi, Rizwan and Muneer, Amgad and Irfan, Muhammad and Zafar, Anas and Shaikh, Muhammad Bilal and Akhtar, Naveed and Wu, Jia and others},
journal={Authorea Preprints},
year={2024},
publisher={Authorea}
}
๐ด Proprietary Models:
- OpenAI: GPT-4, GPT-4.5, GPT-4o, o1, o3, o4-mini, ChatGPT, InstructGPT
- Anthropic: Claude 3 Family, Claude 3.5, Claude 3.7, Claude 4, Anthropic LM
- Google/DeepMind: Gemini 2.5, Gemini 2.0, Gemini 1.5, PaLM 2, Bard, T5, UL2, Chinchilla, Sparrow, Gopher, GLaM, Minerva
- xAI: Grok 3, Grok 3 Mini
- AI21 Labs: Jurassic-1, Jurassic-2
- Mistral AI: Mistral 7B, Mistral Large 2, Mistral Medium
๐ข Open Source Models:
- Meta: Llama 4, Llama 3.x, Llama 2, OPT, Code Llama, Gallactica
- Alibaba: Qwen 3, Qwen 2.5, QwQ-32B
- DeepSeek: DeepSeek-R1, DeepSeek-V3
- Microsoft: Phi-3 Family, Phi-2
- IBM: Granite 3.0, Granite 3.1
- Google: Gemma 2
- Cohere: Command R, Command R+
- BigScience: BLOOM
- EleutherAI: GPT-J, GPT-NeoX, Pythia
- BigCode: StarCoder, StarChat, SantaCoder
- Salesforce: CodeGen2, CodeT5+, XGen
- TIIUAE: Falcon
- Upstage: SOLAR
๐ Academic/Research:
- LMSYS: Vicuna, FastChat-T5
- Stanford: Alpaca
- UC Berkeley: Koala
- LAION: Open Assistant
- OpenLM Research: OpenLLaMA
- MLFoundations: OpenLM
๐ข Other Companies:
- Yandex: YaLM
- Replit: Replit Code
- H2O.ai: h2oGPT
- Databricks: Dolly
- Together: RedPajama-INCITE
- MosaicML: MPT Family
- Stability AI: StableLM
- Nous Research: OpenHermes
- Cerebras: Cerebras-GPT
- Deci AI: DeciCoder
- AI Squared: DLite
- BlinkDL: RWKV
๐ง Reasoning Models (2024-2025):
- OpenAI: o1, o1-mini, o3, o3-mini, o4-mini
- DeepSeek: DeepSeek-R1 Family
- Alibaba: QwQ-32B, Qwen 3 (hybrid reasoning)
- Google: Gemini 2.5 (thinking models)
๐ฌ Conversational Models:
- OpenAI: ChatGPT, GPT-4o
- Anthropic: Claude 3/4 Family
- Google: Bard, Gemini
- xAI: Grok 3
๐ป Code-Specialized:
- Meta: Code Llama
- BigCode: StarCoder, SantaCoder
- Salesforce: CodeGen2, CodeT5+
- Replit: Replit Code
- Deci AI: DeciCoder
๐ Multimodal:
- OpenAI: GPT-4o
- Google: Gemini 2.0/2.5
- Meta: Llama 4, Llama 3.2
โก Efficient/Small:
- Microsoft: Phi-3 Family, Phi-2
- Google: Gemma 2
- AI Squared: DLite
- Upstage: SOLAR
Last updated: July 2025
Original repository: https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v3