Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, And Future Prospects (Updated July 2025)

July 21, 2025 ยท View on GitHub

The Large Language Models Survey repository is a comprehensive compendium dedicated to the exploration and understanding of Large Language Models (LLMs). It houses an assortment of resources including research papers, blog posts, tutorials, code examples, and more to provide an in-depth look at the progression, methodologies, and applications of LLMs. This repo is an invaluable resource for AI researchers, data scientists, or enthusiasts interested in the advancements and inner workings of LLMs. We encourage contributions from the wider community to promote collaborative learning and continue pushing the boundaries of LLM research.

Timeline of LLMs

evolutionv1 1

List of LLMs (Updated July 2025)

Language ModelOrganizationRelease DateCheckpointsPaper/BlogParams (B)Context LengthLicenceTry it
2025 Latest Models
Grok 3 / Grok 3 MinixAI2025/02Grok 3, Grok 3 MiniGrok 3 Beta โ€” The Age of Reasoning Agents314 active (1M+ total) / Smaller variant1M tokensProprietaryxAI Platform
Llama 4 ScoutMeta2025/04Llama 4 ScoutThe Llama 4 herd: The beginning of a new era17B active (109B total)10M tokensLlama 4 Community LicenseHuggingFace
Llama 4 MaverickMeta2025/04Llama 4 MaverickThe Llama 4 herd17B active (400B total)1M tokensLlama 4 Community LicenseHuggingFace
Llama 4 BehemothMeta2025/04 (Training)In TrainingThe Llama 4 herd288B active (~2T total)TBDTBDTBD
Qwen 3 FamilyAlibaba2025/04Qwen 3 FamilyAlibaba unveils Qwen30.6B - 235B (22B active)32K - 131K tokensApache 2.0Qwen Chat
DeepSeek-R1 FamilyDeepSeek2025/01-05DeepSeek-R1, R1-Zero, R1-0528DeepSeek-R1: Incentivizing Reasoning Capability37B active (671B total)128K tokensMITDeepSeek Platform
o3 / o3-mini / o4-miniOpenAI2025/01-04o3, o3-mini, o4-miniIntroducing OpenAI o3 and o4-miniUndisclosed200K tokensProprietaryChatGPT
Claude 4 (Sonnet & Opus)Anthropic2025/05Claude Sonnet 4, Claude Opus 4Introducing Claude 4Undisclosed200K tokensProprietaryClaude.ai
Gemini 2.5 FamilyGoogle2025/03-06Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash-LiteGemini 2.5: Our newest Gemini model with thinkingUndisclosed1M tokensProprietaryGemini
Major 2024 Models
GPT-4o / GPT-4o miniOpenAI2024/05-07GPT-4o, GPT-4o miniHello GPT-4o, GPT-4o mini: advancing cost-efficient intelligenceUndisclosed128K tokensProprietaryChatGPT
o1 / o1-miniOpenAI2024/09o1, o1-miniLearning to Reason with LLMsUndisclosed200K / 128K tokensProprietaryChatGPT
Claude 3 FamilyAnthropic2024/03Claude 3 Haiku, Claude 3 Sonnet, Claude 3 OpusIntroducing the next generation of ClaudeUndisclosed200K tokensProprietaryClaude.ai
Claude 3.5 SonnetAnthropic2024/06Claude 3.5 SonnetClaude 3.5 SonnetUndisclosed200K tokensProprietaryClaude.ai
Claude 3.7 SonnetAnthropic2024/10Claude 3.7 SonnetClaude 3.7 SonnetUndisclosed200K tokensProprietaryClaude.ai
Gemini 1.5 Pro / FlashGoogle2024/02-05Gemini 1.5 Pro, Gemini 1.5 FlashOur next-generation model: Gemini 1.5Undisclosed1M-2M / 1M tokensProprietaryGemini
Gemini 2.0 FlashGoogle2024/12Gemini 2.0 FlashGemini 2.0 FlashUndisclosed1M tokensProprietaryGemini
Gemma 2Google2024/06Gemma 2 FamilyGemma 2: Improving Open Language Models at a Practical Size9B, 27B8K tokensApache 2.0HuggingFace
Llama 3 FamilyMeta2024/04Llama 3 WeightsIntroducing Meta Llama 38B, 70B8K tokensCustomHuggingChat
Llama 3.1Meta2024/07Llama 3.1 WeightsThe Llama 3 Herd of Models8B, 70B, 405B128K tokensCustomHuggingChat
Llama 3.2Meta2024/09Llama 3.2 ModelsLlama 3.2: Revolutionizing edge AI and vision with open, customizable models1B, 3B, 11B, 90B128K tokensCustomHuggingChat
Llama 3.3Meta2024/12Llama 3.3 70BLlama 3.3 70B70B128K tokensCustomHuggingChat
Phi-3 FamilyMicrosoft2024/04-08Phi-3 Mini, Phi-3 Small, Phi-3 Medium, Phi-3.5 MiniPhi-3 Technical Report3.8B - 14B4K-128K tokensMITAzure AI Studio
IBM Granite 3.0 / 3.1IBM2024/10-12Granite 3.0, Granite 3.1IBM Introduces Granite 3.02B, 8B4K / 128K tokensApache 2.0IBM watsonx
Command R / R+Cohere2024/03-04Command R, Command R+Command R: Cohere's scalable generative model35B / 104B128K tokensCC BY-NC 4.0Cohere Platform
DeepSeek-V3 FamilyDeepSeek2024/12-2025/03DeepSeek-V3, DeepSeek-V3-0324DeepSeek-V3 Technical Report37B active (671B total)128K tokensMITDeepSeek Platform
Qwen 2.5 FamilyAlibaba2024/09-2025/01Qwen 2.5 Family, Qwen 2.5-MaxQwen2.5: A Party of Foundation Models0.5B - 72B / Undisclosed32K-128K tokensApache 2.0 / ProprietaryQwen Chat
QwQ-32BAlibaba2024/11QwQ-32B-PreviewQwQ-32B Technical Report32B32K tokensApache 2.0Qwen Chat
Mistral FamilyMistral AI2023/09-2025/05Mistral-7B, Mistral Large 2, Mistral MediumMistral 7B7B - 123B / Undisclosed4K-128K tokensApache 2.0 / ProprietaryMistral Platform
Command R / R+2024/03-04Command R, Command R+Command R: Cohere's scalable generative model35B / 104B128K tokensCC BY-NC 4.0Cohere Platform
DeepSeek-V3 Family2024/12-2025/03DeepSeek-V3, DeepSeek-V3-0324DeepSeek-V3 Technical Report37B active (671B total)128K tokensMITDeepSeek Platform
Qwen 2.5 Family2024/09-2025/01Qwen 2.5 Family, Qwen 2.5-MaxQwen2.5: A Party of Foundation Models0.5B - 72B / Undisclosed32K-128K tokensApache 2.0 / ProprietaryQwen Chat
QwQ-32B2024/11QwQ-32B-PreviewQwQ-32B Technical Report32B32K tokensApache 2.0Qwen Chat
Mistral Family2023/09-2025/05Mistral-7B, Mistral Large 2, Mistral MediumMistral 7B7B - 123B / Undisclosed4K-128K tokensApache 2.0 / ProprietaryMistral Platform
Previous Generation Models
GPT-4 / GPT-4.52023/03-2024/06API AccessGPT-4 Technical ReportUndisclosed8K-128K tokensProprietaryChatGPT
LLaMA 22023/06LLaMA 2 WeightsLlama 2: Open Foundation and Fine-Tuned Chat Models7B - 70B4K tokensCustomHuggingChat
PaLM 22023/05PaLM 2PaLM 2 Technical ReportUndisclosed8K tokensProprietaryBard
Bard2023/03BardBard: An experiment by GoogleUndisclosed8K tokensProprietaryBard
Chinchilla2022/03ChinchillaTraining Compute-Optimal Large Language Models70B2K tokensProprietary[Research Only]
Sparrow2022/09SparrowImproving alignment of dialogue agents via targeted human judgements70B4K tokensProprietary[Research Only]
Gopher2021/12GopherScaling Language Models: Methods, Analysis & Insights from Training Gopher280B2K tokensProprietary[Research Only]
YaLM2022/06YaLM 100BYaLM 100B100B2K tokensApache 2.0GitHub
OPT2022/05OPT FamilyOPT: Open Pre-trained Transformer Language Models0.125B - 175B2K tokensMITHuggingFace
BLOOM2022/11BLOOMBLOOM: A 176B-Parameter Open-Access Multilingual Language Model176B2K tokensOpenRAIL-M v1HuggingFace
Jurassic-1 / Jurassic-22021/08 / 2023/03AI21 StudioJurassic-1: Technical Details And Evaluation178B2K / 8K tokensProprietaryAI21 Studio
Anthropic LM (v4-s3)2022/12Anthropic LMConstitutional AI: Harmlessness from AI Feedback52B4K tokensProprietary[Research Only]
GLaM2021/12GLaMGLaM: Efficient Scaling of Language Models with Mixture-of-Experts1.2T (64B active)2K tokensProprietary[Research Only]
GPT-J / GPT-NeoX2021/06 / 2022/04GPT-J-6B, GPT-NeoX-20BGPT-J-6B: 6B JAX-Based Transformer6B / 20B2K tokensApache 2.0HuggingFace
Minerva2022/06MinervaSolving Quantitative Reasoning Problems with Language Models540B2K tokensProprietary[Research Only]
Gallactica2022/11GallacticaGallactica: A Large Language Model for Science120B2K tokensApache 2.0[Removed]
Vicuna2023/03VicunaVicuna: An Open-Source Chatbot Impressing GPT-47B, 13B, 33B2K tokensCustomFastChat
Alpaca2023/03Stanford AlpacaStanford Alpaca: An Instruction-following LLaMA Model7B2K tokensCustomGitHub
Coding-Specialized Models
Code Llama2023/08Code Llama ModelsCode Llama: Open Foundation Models for Code7B - 34B4K tokensCustomHuggingChat
StarCoder / StarChat2023/05StarCoder, StarChatStarCoder: A State-of-the-Art LLM for Code1.1B - 16B8K tokensOpenRAIL-M v1HuggingFace
CodeGen2 / CodeGen2.52023/04-07CodeGen2, CodeGen2.5CodeGen2: Lessons for Training LLMs on Programming and Natural Languages1B - 16B2K tokensApache 2.0HuggingFace
CodeT5+2023/05CodeT5+CodeT5+: Open Code Large Language Models for Code Understanding and Generation0.22B - 16B512 tokensBSD-3-ClauseGitHub
Replit Code2023/05replit-code-v1-3bTraining a SOTA Code LLM in 1 week2.7BInfinity (ALiBi)CC BY-SA-4.0HuggingFace
SantaCoder2023/01SantaCoderSantaCoder: don't reach for the stars!1.1B2K tokensOpenRAIL-M v1HuggingFace
DeciCoder2023/08DeciCoder-1BIntroducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation1.1B2K tokensApache 2.0HuggingFace
Additional Historical Models
T5 / Flan-T52019/10T5 & Flan-T5Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer0.06B - 11B512 tokensApache 2.0HuggingFace
UL2 / Flan-UL22022/10UL2 & Flan-UL2UL2 20B: An Open Source Unified Language Learner20B512-2K tokensApache 2.0HuggingFace
InstructGPT2022/03API AccessTraining language models to follow instructions with human feedback1.3B - 175B2K tokensProprietary[OpenAI API]
ChatGPT2022/11API AccessChatGPT: Optimizing Language Models for Dialogue~175B4K tokensProprietaryChatGPT
Pythia2023/04Pythia 70M - 12BPythia: A Suite for Analyzing Large Language Models Across Training and Scaling0.07B - 12B2K tokensApache 2.0HuggingFace
Dolly2023/04dolly-v2-12bFree Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM3B, 7B, 12B2K tokensMITHuggingFace
RedPajama-INCITE2023/05RedPajama-INCITEReleasing 3B and 7B RedPajama-INCITE family of models3B - 7B2K tokensApache 2.0HuggingFace
Falcon2023/05Falcon-180B, Falcon-40B, Falcon-7BThe RefinedWeb Dataset for Falcon LLM7B, 40B, 180B2K tokensApache 2.0HuggingFace
MPT Family2023/05-06MPT-7B, MPT-30BIntroducing MPT-7B7B, 30B2K-8K tokensApache 2.0MosaicML
OpenLLaMA2023/05OpenLLaMA ModelsOpenLLaMA: An Open Reproduction of LLaMA3B, 7B, 13B2K tokensApache 2.0HuggingFace
h2oGPT2023/05h2oGPTBuilding the World's Best Open-Source Large Language Model12B - 20B256-2K tokensApache 2.0h2oGPT
FastChat-T52023/04fastchat-t5-3b-v1.0FastChat-T5: Compact and Commercial-friendly Chatbot3B512 tokensApache 2.0HuggingFace
StableLM2023/04StableLM-AlphaStability AI Launches StableLM Suite3B - 65B4K tokensCC BY-SA-4.0HuggingFace
Koala2023/04KoalaKoala: A Dialogue Model for Academic Research13B4K tokensCustomBAIR
OpenHermes2023/09OpenHermes-7B, OpenHermes-13BNous Research OpenHermes7B, 13B4K tokensMITHuggingFace
SOLAR2023/12Solar-10.7BSOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-scaling10.7B4K tokensApache 2.0HuggingFace
Phi-22023/12phi-2Phi-2: The surprising power of small language models2.7B2K tokensMITHuggingFace
OpenLM2023/09OpenLM 1B, OpenLM 7BOpen LM: a minimal but performative language modeling repository1B, 7B2K tokensMITHuggingFace
RWKV2021/08RWKV ModelsThe RWKV Language Model0.1B - 14BInfinite (RNN)Apache 2.0HuggingFace
DLite2023/05dlite-v2-1_5bAnnouncing DLite V2: Lightweight, Open LLMs0.124B - 1.5B1K tokensApache 2.0HuggingFace
Open Assistant2023/03OA-Pythia-12BDemocratizing Large Language Model Alignment12B2K tokensApache 2.0HuggingFace
Cerebras-GPT2023/03Cerebras-GPTCerebras-GPT: A Family of Open, Compute-efficient, Large Language Models0.111B - 13B2K tokensApache 2.0HuggingFace
XGen2023/06XGen-7B-8K-BaseLong Sequence Modeling with XGen7B8K tokensApache 2.0HuggingFace

Key Developments in 2024

The year 2024 was transformative for the LLM landscape, with multiple breakthrough releases that established new benchmarks and capabilities:

OpenAI's Major Releases: GPT-4o launched in May 2024 brought true multimodal capabilities with 232ms response times, while o1 and o1-mini in September introduced reasoning models that spend more time "thinking" through problems, achieving 83% on mathematical olympiad problems compared to GPT-4o's 13%.

Anthropic's Claude 3 Family: The Claude 3 series (Haiku, Sonnet, Opus) launched in March 2024 were the first models to challenge GPT-4's dominance on leaderboards, followed by Claude 3.5 Sonnet in June and Claude 3.7 Sonnet in October, which became particularly popular for coding tasks.

Google's Gemini Evolution: Gemini 1.5 Pro debuted in February 2024 with up to 2M token context windows, followed by Gemini 1.5 Flash in May for faster performance, and Gemini 2.0 Flash in December 2024.

Meta's Llama Progression: Llama 3 (8B, 70B) launched in April 2024, followed by the groundbreaking Llama 3.1 series in July including the massive 405B parameter model - the largest open-source model at the time. Llama 3.2 brought multimodal capabilities in September, and Llama 3.3 concluded the year in December.

Microsoft's Phi Revolution: Microsoft's Phi-3 family proved that smaller models could punch above their weight, with Phi-3 Mini (3.8B parameters) matching much larger models on benchmarks. The series expanded with Phi-3 Small (7B), Phi-3 Medium (14B), and Phi-3.5 Mini throughout 2024.

Enterprise-Focused Models: IBM Granite 3.0 launched in October 2024 focused on enterprise use cases, while Cohere's Command R and Command R+ models excelled in retrieval-augmented generation tasks.

Google's Open Models: Gemma 2 (9B, 27B parameters) launched in June 2024 became highly popular in the open-source community, consistently ranking high in community evaluations.

Key Developments in 2025

The year 2025 has been marked by several breakthrough releases in the LLM landscape. Grok 3, launched by xAI in February 2025, introduced a 1 million token context window and achieved a record-breaking Elo score of 1402 in the Chatbot Arena, making it the first AI model to surpass this milestone. The model was trained on 12.8 trillion tokens and boasts 10x the computational power of its predecessor.

Meta's Llama 4 family represents a major leap forward with the introduction of Mixture-of-Experts (MoE) architecture. Llama 4 Scout features an unprecedented 10 million token context window, while Llama 4 Maverick achieves an ELO score of 1417 on LMSYS Chatbot Arena, outperforming GPT-4o and Gemini 2.0 Flash.

DeepSeek-R1 emerged as the first major open-source reasoning model, trained purely through reinforcement learning without supervised fine-tuning. The model demonstrates performance comparable to OpenAI's o1 across math, code, and reasoning tasks while being completely open-source under the MIT license.

Cursor-AI emerged as a vibe coding platform. Qwen 3, released by Alibaba in April 2025, features a family of "hybrid" reasoning models ranging from 0.6B to 235B parameters, supporting 119 languages and trained on over 36 trillion tokens. The models seamlessly integrate thinking and non-thinking modes, offering users flexibility to control the thinking budget.

OpenAI continued its reasoning model series with o3 and o4-mini in April 2025, while Anthropic launched Claude 4 (Opus 4 and Sonnet 4) in May 2025, setting new standards for coding and advanced reasoning with extended thinking capabilities and tool use.

Google's Gemini 2.5 Pro debuted as a thinking model with a 1 million token context window, leading on LMArena leaderboards and excelling in coding, math, and multimodal understanding tasks.

  1. Reasoning Models: The emergence of models that can "think" through problems step-by-step, with extended reasoning capabilities becoming standard.

  2. Massive Context Windows: Models now support context windows ranging from 1M to 10M tokens, enabling processing of entire codebases and documents.

  3. Mixture-of-Experts (MoE) Architecture: More efficient model architectures that activate only a subset of parameters during inference.

  4. Open-Source Reasoning: DeepSeek-R1's success has democratized access to reasoning capabilities previously available only in proprietary models.

  5. Multimodal Integration: Native multimodality becoming standard, with models trained on text, images, audio, and video from the ground up.

  6. Tool Use and Agentic Capabilities: Enhanced ability to use tools, execute code, and perform complex multi-step tasks autonomously.

Performance Benchmarks (2025)

Reasoning Benchmarks (AIME 2025)

  • Grok 3: 93.3%
  • DeepSeek-R1-0528: 87.5%
  • Gemini 2.5 Pro: 86.7%
  • o3-mini: 86.5%

Coding Benchmarks (SWE-bench Verified)

  • Claude Opus 4: 72.5%
  • Claude Sonnet 4: 72.7%
  • OpenAI Codex 1: 72.1%
  • Llama 4 Maverick: ~70%

Long Context Performance (1M+ tokens)

  • Llama 4 Scout: 10M tokens
  • Grok 3: 1M tokens
  • Gemini 2.5 Pro: 1M tokens
  • Llama 4 Maverick: 1M tokens

Model Evolution Timeline

2022: Foundation Era

  • ChatGPT revolutionized conversational AI
  • InstructGPT introduced instruction following
  • Large proprietary models dominated (GPT-3, PaLM, Chinchilla)

2023: Open Source Awakening

  • LLaMA sparked the open-source revolution
  • Claude introduced constitutional AI
  • Specialized coding models emerged (Code Llama, StarCoder)
  • Model sizes optimized for efficiency (Phi, Mistral)

2024: Multimodal & Reasoning Breakthrough

  • GPT-4o achieved true multimodality
  • o1 introduced step-by-step reasoning
  • Claude 3 challenged GPT-4 dominance
  • Llama 3.1 405B became largest open model
  • Gemini 1.5 pushed context limits to 2M tokens

2025: The Reasoning Revolution

  • Grok 3 achieved highest Arena scores
  • DeepSeek-R1 democratized reasoning capabilities
  • Llama 4 introduced 10M token contexts
  • Claude 4 set new coding standards
  • Qwen 3 pioneered hybrid reasoning modes

Citation

If you find our survey useful for your research, please cite the following paper:

@article{hadi2024large,
  title={Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects},
  author={Hadi, Muhammad Usman and Al Tashi, Qasem and Shah, Abbas and Qureshi, Rizwan and Muneer, Amgad and Irfan, Muhammad and Zafar, Anas and Shaikh, Muhammad Bilal and Akhtar, Naveed and Wu, Jia and others},
  journal={Authorea Preprints},
  year={2024},
  publisher={Authorea}
}

Model Organization Summary

By Company/Organization:

๐Ÿ”ด Proprietary Models:

  • OpenAI: GPT-4, GPT-4.5, GPT-4o, o1, o3, o4-mini, ChatGPT, InstructGPT
  • Anthropic: Claude 3 Family, Claude 3.5, Claude 3.7, Claude 4, Anthropic LM
  • Google/DeepMind: Gemini 2.5, Gemini 2.0, Gemini 1.5, PaLM 2, Bard, T5, UL2, Chinchilla, Sparrow, Gopher, GLaM, Minerva
  • xAI: Grok 3, Grok 3 Mini
  • AI21 Labs: Jurassic-1, Jurassic-2
  • Mistral AI: Mistral 7B, Mistral Large 2, Mistral Medium

๐ŸŸข Open Source Models:

  • Meta: Llama 4, Llama 3.x, Llama 2, OPT, Code Llama, Gallactica
  • Alibaba: Qwen 3, Qwen 2.5, QwQ-32B
  • DeepSeek: DeepSeek-R1, DeepSeek-V3
  • Microsoft: Phi-3 Family, Phi-2
  • IBM: Granite 3.0, Granite 3.1
  • Google: Gemma 2
  • Cohere: Command R, Command R+
  • BigScience: BLOOM
  • EleutherAI: GPT-J, GPT-NeoX, Pythia
  • BigCode: StarCoder, StarChat, SantaCoder
  • Salesforce: CodeGen2, CodeT5+, XGen
  • TIIUAE: Falcon
  • Upstage: SOLAR

๐ŸŽ“ Academic/Research:

  • LMSYS: Vicuna, FastChat-T5
  • Stanford: Alpaca
  • UC Berkeley: Koala
  • LAION: Open Assistant
  • OpenLM Research: OpenLLaMA
  • MLFoundations: OpenLM

๐Ÿข Other Companies:

  • Yandex: YaLM
  • Replit: Replit Code
  • H2O.ai: h2oGPT
  • Databricks: Dolly
  • Together: RedPajama-INCITE
  • MosaicML: MPT Family
  • Stability AI: StableLM
  • Nous Research: OpenHermes
  • Cerebras: Cerebras-GPT
  • Deci AI: DeciCoder
  • AI Squared: DLite
  • BlinkDL: RWKV

By Model Type:

๐Ÿง  Reasoning Models (2024-2025):

  • OpenAI: o1, o1-mini, o3, o3-mini, o4-mini
  • DeepSeek: DeepSeek-R1 Family
  • Alibaba: QwQ-32B, Qwen 3 (hybrid reasoning)
  • Google: Gemini 2.5 (thinking models)

๐Ÿ’ฌ Conversational Models:

  • OpenAI: ChatGPT, GPT-4o
  • Anthropic: Claude 3/4 Family
  • Google: Bard, Gemini
  • xAI: Grok 3

๐Ÿ’ป Code-Specialized:

  • Meta: Code Llama
  • BigCode: StarCoder, SantaCoder
  • Salesforce: CodeGen2, CodeT5+
  • Replit: Replit Code
  • Deci AI: DeciCoder

๐ŸŒ Multimodal:

  • OpenAI: GPT-4o
  • Google: Gemini 2.0/2.5
  • Meta: Llama 4, Llama 3.2

โšก Efficient/Small:

  • Microsoft: Phi-3 Family, Phi-2
  • Google: Gemma 2
  • AI Squared: DLite
  • Upstage: SOLAR

Last updated: July 2025
Original repository: https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v3