LLM4IR-Survey
November 13, 2025 ยท View on GitHub
This is the collection of papers related to large language models for information retrieval. These papers are organized according to our survey paper Large Language Models for Information Retrieval: A Survey.
Feel free to contact us if you find a mistake or have any advice. Email: yutaozhu94@gmail.com and dou@ruc.edu.cn.
๐ Citation
Please kindly cite our paper if helps your research:
@article{LLM4IRSurvey,
author={Yutao Zhu and
Huaying Yuan and
Shuting Wang and
Jiongnan Liu and
Wenhan Liu and
Chenlong Deng and
Haonan Chen and
Zhicheng Dou and
Ji-Rong Wen},
title={Large Language Models for Information Retrieval: A Survey},
journal={CoRR},
volume={abs/2308.07107},
year={2023},
url={https://arxiv.org/abs/2308.07107},
eprinttype={arXiv},
eprint={2308.07107}
}
๐ Update Log
-
Version 4 [2025-09-17]
- Search Agent: We reformulate the search agent section.
- Reranker: We add several listwise rerankers and Section 'Reasoning-intensive Rerankers'.
-
Version 3 [2024-09-03]
- We refine the background to pay more attention to IR.
- Rewriter: We add a new section "Formats of Rewritten Queries" to provide a more clear classfication and incorporated up-to-date methods.
- Retriever: We incorporated up-to-date methods that utilize LLM to enlarge the dataset used for training retrievers or to improve the overall structure and design of retriever systems.
- Reranker: We have added some unsupervised rerankers, several studies focusing on training data augmentation, and discussions on the limitations of LLM rerankers.
- Reader: We added the latest studies on readers, particularly enriching the works in the active reader section.
- Search Agent: We added the latest studies on static and dynamic search agents, particularly enriching the works in benchmarking and self-planning.
-
Version 2 [2024-01-19]
- We added a new section to introduce search agents, which represent an innovative approach to integrating LLMs with IR systems.
- Rewriter: We added recent works on LLM-based query rewriting, most of which focus on conversational search.
- Retriever: We added the latest techniques that leverage LLMs to expand the training corpus for retrievers or to enhance retrievers' architectures.
- Reranker: We added recent LLM-based ranking works to each of the three part: Utilizing LLMs as Supervised Rerankers, Utilizing LLMs as Unsupervised Rerankers, and Utilizing LLMs for Training Data Augmentation.
- Reader: We added the latest studies in LLM-enhanced reader area, including a section introducing the reference compression technique, a section discussing the applications of LLM-enhanced readers, and a section analyzing the characteristics of LLM-enhanced readers.
- Future Direction: We added a section about search agents and a section discussing the bias caused by leveraging LLMs into IR systems.
๐ Table of Content
๐ Paper List
Query Rewriter
Prompting Methods
- Query2doc: Query Expansion with Large Language Models, Wang et al., arXiv 2023. [Paper]
- Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval, Mackie et al., arXiv 2023. [Paper]
- Generative Relevance Feedback with Large Language Models, Mackie et al., SIGIR 2023 (short paper). [Paper]
- GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval, Mackie et al., arXiv 2023. [Paper]
- Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search, Mao et al., arXiv 2023. [Paper]
- Precise Zero-Shot Dense Retrieval without Relevance Labels, Gao et al., ACL 2023. [Paper]
- Query Expansion by Prompting Large Language Models, Jagerman et al., arXiv 2023. [Paper]
- Large Language Models are Strong Zero-Shot Retriever, Shen et al., arXiv 2023. [Paper]
- Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting, Ye et al., EMNLP 2023 (Findings). [Paper]
- Can generative llms create query variants for test collections? an exploratory study, M. Alaofi et al., SIGIR 2023 (short paper). [Paper]
- Corpus-Steered Query Expansion with Large Language Models, Lei et al., EACL 2024 (Short Paper). [Paper]
- Large language model based long-tail query rewriting in taobao search, Peng et al., WWW 2024. [Paper]
- Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?, Li et al., SIGIR 2024. [Paper]
- Query Performance Prediction using Relevance Judgments Generated by Large Language Models, Meng et al., arXiv 2024. [Paper]
- RaFe: Ranking Feedback Improves Query Rewriting for RAG, Mao et al., arXiv 2024. [Paper]
- Crafting the Path: Robust Query Rewriting for Information Retrieval, Baek et al., arXiv 2024. [Paper]
- Query Rewriting for Retrieval-Augmented Large Language Models, Ma et al., arXiv 2023. [Paper]
Fine-tuning Methods
- QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation, Srinivasan et al., EMNLP 2022 (Industry). [Paper] (This paper explore fine-tuning methods in baseline experiments.)
Knowledge Distillation Methods
- QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation, Srinivasan et al., EMNLP 2022 (Industry). [Paper]
- Knowledge Refinement via Interaction Between Search Engines and Large Language Models, Feng et al., arXiv 2023. [Paper]
- Query Rewriting for Retrieval-Augmented Large Language Models, Ma et al., arXiv 2023. [Paper]
Retriever
Leveraging LLMs to Generate Search Data
- InPars: Data Augmentation for Information Retrieval using Large Language Models, Bonifacio et al., arXiv 2022. [Paper]
- Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval, Ma et al., arXiv 2023. [Paper]
- InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval, Jeronymo et al., arXiv 2023. [Paper]
- Promptagator: Few-shot Dense Retrieval From 8 Examples, Dai et al., ICLR 2023. [Paper]
- AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation, Meng et al., arXiv 2023. [Paper]
- UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers, Saad-Falco et al., arXiv 2023. [Paper]
- Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models, Peng et al., arXiv 2023. [Paper]
- CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation, Huang et al., ACL 2023. [Paper]
- Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval, Thakur et al., arXiv 2023. [Paper]
- Questions Are All You Need to Train a Dense Passage Retriever, Sachan et al., ACL 2023. [Paper]
- Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators, Chen et al., EMNLP 2023. [Paper]
- Gecko: Versatile Text Embeddings Distilled from Large Language Models, Lee et al., arXiv 2024. [Paper]
- Improving Text Embeddings with Large Language Models, Wang et al., ACL 2024. [Paper]
Employing LLMs to Enhance Model Architecture
- Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arXiv 2022. [Paper]
- Fine-Tuning LLaMA for Multi-Stage Text Retrieval, Ma et al., arXiv 2023. [Paper]
- Large Dual Encoders Are Generalizable Retrievers, Ni et al., EMNLP 2022. [Paper]
- Task-aware Retrieval with Instructions, Asai et al., ACL 2023 (Findings). [Paper]
- Transformer memory as a differentiable search index, Tay et al., NeurIPS 2022. [Paper]
- Large Language Models are Built-in Autoregressive Search Engines, Ziems et al., ACL 2023 (Findings). [Paper]
- Chatretriever: Adapting large language models for generalized and robust conversational dense retrieval, Mao et al., arXiv. [Paper]
- How does generative retrieval scale to millions of passages?, Pradeep et al., ACL 2023. [Paper]
- CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks, Li et al., SIGIR. [Paper]
Reranker
Utilizing LLMs as Supervised Rerankers
- Multi-Stage Document Ranking with BERT, Nogueira et al., arXiv 2019. [Paper]
- Document Ranking with a Pretrained Sequence-to-Sequence Model, Nogueira et al., EMNLP 2020 (Findings). [Paper]
- Text-to-Text Multi-view Learning for Passage Re-ranking, Ju et al., SIGIR 2021 (Short Paper). [Paper]
- The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models, Pradeep et al., arXiv 2021. [Paper]
- RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses, Zhuang et al., SIGIR 2023 (Short Paper). [Paper]
- Fine-Tuning LLaMA for Multi-Stage Text Retrieval, Ma et al., arXiv 2023. [Paper]
- A Two-Stage Adaptation of Large Language Models for Text Ranking, Zhang et al., ACL 2024 (Findings). [Paper]
- Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models, Zhang et al., arXiv 2023. [Paper]
- ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval, Yoon et al., ACL 2024. [Paper]
- Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models, Peng et al., arXiv 2024. [Paper]
- Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models, Liu et al., arXiv 2024. [Paper]
Utilizing LLMs as Unsupervised Rerankers
- Holistic Evaluation of Language Models, Liang et al., arXiv 2022. [Paper]
- Improving Passage Retrieval with Zero-Shot Question Generation, Sachan et al., EMNLP 2022. [Paper]
- Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker, Cho et al., ACL 2023 (Findings). [Paper]
- Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking, Zhuang et al., EMNLP 2023 (Findings). [Paper]
- PaRaDe: Passage Ranking using Demonstrations with Large Language Models, Drozdov et al., EMNLP 2023 (Findings). [Paper]
- Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels, Zhuang et al., arXiv 2023. [Paper]
- Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent, Sun et al., EMNLP 2023. [Paper]
- Zero-Shot Listwise Document Reranking with a Large Language Model, Ma et al., arXiv 2023. [Paper]
- Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models, Tang et al., arXiv 2023. [Paper]
- Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting, Qin et al., NAACL 2024 (Findings). [Paper]
- A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models, Zhuang et al., SIGIR 2024. [Paper]
- InstUPR: Instruction-based Unsupervised Passage Reranking with Large Language Models, Huang and Chen, arXiv 2024. [Paper]
- Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers, Guo et al., arXiv 2024. [Paper]
- DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task, Liu et al., arXiv 2024. [Paper]
- An Investigation of Prompt Variations for Zero-shot LLM-based Rankers, Sun et al., arXiv 2024. [Paper]
- TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy, Chen et al., arXiv 2024. [Paper]
- Top-Down Partitioning for Efficient List-Wise Ranking, Parry et al., arXiv 2024. [Paper]
- PRP-Graph: Pairwise Ranking Prompting to LLMs with Graph Aggregation for Effective Text Re-ranking, Luo et al., ACL 2024. [Paper]
- Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing, Yan et al., arXiv 2024. [Paper]
- Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models, Liu et al., ACL 2025. [Paper]
- CoRanking: Collaborative Ranking with Small and Large Ranking Agents, Liu et al., EMNLP 2025 (Findings). [Paper]
- APEER : Automatic Prompt Engineering Enhances Large Language Model Reranking, Jin et al., WWW 2025. [Paper]
- Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing, Yan et al., EMNLP 2024. [Paper]
Utilizing LLMs for Training Data Augmentation
- ExaRanker: Explanation-Augmented Neural Ranker, Ferraretto et al., SIGIR 2023 (Short Paper). [Paper]
- InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers, Boytsov et al., arXiv 2023. [Paper]
- Generating Synthetic Documents for Cross-Encoder Re-Rankers, Askari et al., arXiv 2023. [Paper]
- Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers, Sun et al., arXiv 2023. [Paper]
- RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models, Pradeep et al., arXiv 2023. [Paper]
- RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!, Pradeep et al., arXiv 2023. [Paper]
- ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs, Ferraretto et al., arXiv 2024. [Paper]
- Expand, Highlight, Generate: RL-driven Document Generation for Passage Reranking, Askari et al., EMNLP 2023. [Paper]
- FIRST: Faster Improved Listwise Reranking with Single Token Decoding, Reddy et al., arXiv 2024. [Paper]
Reasoning-intensive Rerankers
- ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability, Liu et al., arXiv 2025. [Paper]
- Rank1: Test-Time Compute for Reranking in Information Retrieval, Weller et al., arXiv 2025. [Paper]
- Rank-K: Test-Time Reasoning for Listwise Reranking, Yang et al., arXiv 2025. [Paper]
- REARANK: Reasoning Re-ranking Agent via Reinforcement Learning, Zhang et al., arXiv 2025. [Paper]
- Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning, Zhuang et al., arXiv 2025. [Paper]
- TFRank: Think-Free Reasoning Enables Practical Pointwise LLM Ranking, Fan et al., arXiv 2025. [Paper]
Reader
Passive Reader
- REALM: Retrieval-Augmented Language Model Pre-Training, Guu et al., ICML 2020. [Paper]
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., NeurIPS 2020. [Paper]
- REPLUG: Retrieval-Augmented Black-Box Language Models, Shi et al., arXiv 2023. [Paper]
- Atlas: Few-shot Learning with Retrieval Augmented Language Models, Izacard et al., JMLR 2023. [Paper]
- Internet-augmented Language Models through Few-shot Prompting for Open-domain Question Answering, Lazaridou et al., arXiv 2022. [Paper]
- Rethinking with Retrieval: Faithful Large Language Model Inference, He et al., arXiv 2023. [Paper]
- FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Vu et al., arxiv 2023. [Paper]
- Enabling Large Language Models to Generate Text with Citations, Gao et al., EMNLP 2023. [Paper]
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Yu et al., arxiv 2023. [Paper]
- Improving Retrieval-Augmented Large Language Models via Data Importance Learning, Lyu et al., arXiv 2023. [Paper]
- Search Augmented Instruction Learning, Luo et al., EMNLP 2023 (Findings). [Paper]
- RADIT: Retrieval-Augmented Dual Instruction Tuning, Lin et al., arXiv 2023. [Paper]
- Improving Language Models by Retrieving from Trillions of Tokens, Borgeaud et al., ICML 2022. [Paper]
- In-Context Retrieval-Augmented Language Models, Ram et al., arXiv 2023. [Paper]
- Interleaving Retrieval with Chain-of-thought Reasoning for Knowledge-intensive Multi-step Questions, Trivedi et al., ACL 2023. [Paper]
- Improving Language Models via Plug-and-Play Retrieval Feedback, Yu et al., arXiv 2023. [Paper]
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy, Shao et al., EMNLP 2023 (Findings). [Paper]
- Retrieval-Generation Synergy Augmented Large Language Models, Feng et al., arXiv 2023. [Paper]
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Asai et al., arXiv 2023. [Paper]
- Active Retrieval Augmented Generation, Jiang et al., EMNLP 2023. [Paper]
Active Reader
- Measuring and Narrowing the Compositionality Gap in Language Models, Press et al., arXiv 2022. [Paper]
- DEMONSTRATEโSEARCHโPREDICT: Composing Retrieval and Language Models for Knowledge-intensive NLP, Khattab et al., arXiv 2022. [Paper]
- Answering Questions by Meta-Reasoning over Multiple Chains of Thought, Yoran et al., arXiv 2023. [Paper]
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Lee ei al., arXiv 2024. [Paper]
- Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs, Wang et al., arXiv 2024. [Paper]
Compressor
- LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs, Arefeen et al., arXiv 2023. [Paper]
- RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation, Xu et al., arXiv 2023. [Paper]
- TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction, Liu et al., EMNLP 2023 (Findings). [Paper]
- Learning to Filter Context for Retrieval-Augmented Generation, Wang et al., arXiv 2023. [Paper]
Analysis
- Lost in the Middle: How Language Models Use Long Contexts, Liu et al., arXiv 2023. [Paper]
- Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation, Ren et al., arXiv 2023. [Paper]
- Exploring the Integration Strategies of Retriever and Large Language Models, Liu et al., arXiv 2023. [Paper]
- Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models, Aksitov et al., arXiv 2023. [Paper]
- When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories, Mallen et al., ACL 2023. [Paper]
Applications
- Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering, Wang et al., arXiv 2023. [Paper]
- ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science, Munikoti et al., arXiv 2023. [Paper]
- Crosslingual Retrieval Augmented In-context Learning for Bangla, Li et al., arXiv 2023. [Paper]
- Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature, Lozano et al., arXiv 2023. [Paper]
- Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models, Zhang et al., ICAIF 2023. [Paper]
- Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models, Louis et al., arXiv 2023. [Paper]
- RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit, Liu et al., arXiv 2023. [Paper]
- Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models, Jiang et al., arXiv 2023. [Paper]
- RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models, Hoshi et al., EMNLP 2023. [Paper]
- Don't forget private retrieval: distributed private similarity search for large language models, Zyskind et al., arXiv 2023. [Paper]
Search Agent
Information Seeking Module
- A cognitive writing perspective for constrained long-form text generation, Wan et al., ACL 2025 (Findings). [Paper]
- CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models, Gong et al., SIGIR 2024. [Paper]
- Search-o1: Agentic search-enhanced large reasoning models, Li et al., arXiv 2025. [Paper]
- Agent Laboratory: Using LLM Agents as Research Assistants, Schmidgall et al., arXiv 2025. [Paper]
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al., arXiv 2024. [Paper]
- Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models, Yu et al., arXiv 2024. [Paper]
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis, Sun et al., arXiv 2025. [Paper]
- Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning, Dong et al., arXiv 2025. [Paper]
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching, Sun et al., arXiv 2025. [Paper]
- Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution, Qiu et al., arXiv 2025. [Paper]
Benchmarks and Resources
- TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension, Joshi et al., ACL 2017. [Paper]
- Measuring short-form factuality in large language models, Wei et al., arXiv 2024. [Paper]
- When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories, Mallen et al., ACL 2023. [Paper]
- Natural Questions: a Benchmark for Question Answering Research, Kwiatkowski et al., ACL 2019. [Paper]
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering, Yang et al., EMNLP 2018. [Paper]
- Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps, Ho et al., COLING 2020. [Paper]
- Humanity's Last Exam, Phan et al., arXiv 2025. [Paper]
- BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents, Wei et al., arXiv 2025. [Paper]
- GAIA: a benchmark for General AI Assistants, Mialon et al., ICLR 2024. [Paper]
- AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?, Yoran et al., EMNLP 2024. [Paper]
- Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks, Fourney et al., arXiv 2024. [Paper]
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, Jimenez et al., arXiv 2023. [Paper]
- OctoPack: Instruction Tuning Code Large Language Models, Muennighoff et al., ICLR 2024. [Paper]
- MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering, Chan et al., ICLR 2025. [Paper]
- MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation, Huang et al., ICML 2024. [Paper]
- RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts, Wijk et al., Arxiv 2024. [Paper]
- ResearchTown: Simulator of Human Research Community, Yu et al., Arxiv 2024. [Paper]
- WebArena: A Realistic Web Environment for Building Autonomous Agents, Zhou et al., ICLR 2024. [Paper]
- Spa-Bench: a comprehensive Benchmark for Smartphone Agent Evaluation, Chen et al., ICLR 2025. [Paper]
- WebWalker: Benchmarking LLMs in Web Traversal, Wu et al., ACL 2025. [Paper]
- WebDancer: Towards Autonomous Information Seeking Agency, Wu et al., Arxiv 2025. [Paper]
- WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization, Tao et al., Arxiv 2025. [Paper]
- WebSailor: Navigating Super-human Reasoning for Web Agent, Li et al., Arxiv 2025. [Paper]