26 of April 2023
May 18, 2026 · View on GitHub
Autonomous Agents
Autonomous Agents-research papers. Updated daily. Resources-section-section.
Research papers: 2023
2026 (5/5), 2026 (4/5), 2026 (3/5), 2026 (2/5), 2026 (1/5), 2025 (4/4),2025 (3/4), 2025 (2/4), 2025 (1/4), 2024, 2023, Earlier
Chronological order.
22th of December 2023
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
- Pangu-Agent: Introduces a generic RL-based objective to improve agents intrinsic and extrinsic functions.
21st of December 2023
AppAgent: Multimodal Agents as Smartphone Users
- Multimodal VLM agents learn operate popular smartphone apps by creating a knowledge base through: Autonomous exploration and Human demonstrations.
- Includes: Exploration phase and Deployment phase.
- Exploration phase learns smartphone functionalities through trial and error, which are saves records of effects to actions and stops, if the current view is unrelated to the assigned task. Exploration stops, whene task is finished. Alternatively these behaviours are shown through human demonstrations, which keeps the agent exploration streamlined and efficient.
- In deployment phase, the VLM agent has access to the UI screenshot and potential actions. The agent generates a summary of the actions taken and interaction history, which are passed to the next step.
Capture the Flag: Uncovering Data Insights with Large Language Models
- Exlores two types of Data Science Agents: Explorer agent and Aggregator agent
20th of December 2023
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
- AgentCoder: Multi-Agent Assistant Code Generation made from Programmer Agent, Test designer Agent and Test executor Agent
- Uses Self-Refine with CoT in a Multi-Agent System.
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
- LM Assertions: Integrates with DSPy, which integrates reasoning, self-improvement, augmentation, retrieval and tools (DSPy is like challenger for Langchain).
- To help runtime self-refinement in LM pipelines with boolean type conditions: Assert (hard or critical condition) and Suggest (soft condition).
- For example a critical condition (hard) is such, that will resul the LM pipeline to halt, if the condition is not met with maximum number of attempts, while Suggest-option still lets the pipeline to continue.
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
- ASSISTGUI: Window mouse / keyboard management with LLM.
- Explores generative agents in urban environments: includes memory modyke, movement module, visual inference module and a LLM module
- Discrete Information Retrieval (dIR): Text-queries of SQL databases using LLMs.
19th of December 2023
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach
- Plays Starcraft 2 better than an average player by using Chain of Summarization (CoS), python-sc2 and TextStarCraft II-environment (Observation-to-Text Adapter: and Text-to-Action Adapter).
- Chain of Summarization (CoS): Improves LLMs capability to extract / analyze information using two compnents: Single-frame summarization and Multi-frame summarization.
- TextStarCraft II-environment processes game information into textual format for LLM model defining macro-actions and a rule-based method for micro-actions
- System prompt includes: Situation Overview, Situation Analysis, Strategic Planning, Opponent Strategy, Analysis, Strategic Recommendations, Decision-Making rocess.
- Reduces 10x the need of LLM API calls and improves strategic, analytical and judging capabilities.
19th of December 2023
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives
- LLM empowered agent-based modeling and simulation framework: surveys the landscape of utilizing LLMs in agent-based modeling and simulation.
- Framework examines challenges, future directions, motivation for applying LLMs, environment perception, human alignment, action generation, evaluation, cyber, physical, social, and hybrid domains.
- This framework provides a comprehensive overview of recent works in this interdisciplinary field.
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives
- Reviews LLM-based agents on their ability to simulate various human-like capabilities.
18th of December 2023
Agent Assessment of Others Through the Lens of Self
- Discusses concept of Self-Awareness of Autonomous Agents.
Evaluating Language-Model Agents on Realistic Autonomous Tasks
- Autonomous Replication and Adaption (ARA) framework: reviews ability of LLM agents to acquire resources, create copies of themselves and adapt to novel situations in the real world.
- Tests LLM-agents using Scaffolding programs to interact with LLMs.
- Defines implications of potentially ARA-level agents.
LLM-ARK: Knowledge Graph Reasoning Using Large Language Models via Deep Reinforcement Learning
- LLM-ARK: LLM reasons from Knowledge Graphs with DRL.
17th of December 2023
Learning to Act without Actions
- LAPO (Latent Action Policy).
16th of December 2023
ProTIP: Progressive Tool Retrieval Improves Planning
- Progressive Tool Retrieval Improves Planning (ProTIP): Mulit-step planning with external tools, where tasks are decomposed without explicit definition of the sub-task.
- Addresses the issue, where single-step tool retrieval does not manage to handle dependencies between the tools.
15th of December 2023
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
- Self-Imepoving LLM model without any human-assisted data for fine tuning achieving significantly better reasoning results with smaller model, when using the synthetic data to distill smaller model.
- Finetunes LLM with ReST using ReAct-method reasoning-actions.
14th of December 2023
Practices for Governing Agentic AI Systems
- OpenAI's research on Agentic AI systems with definition of Agentic AI system.
- Includes level of "Agenticness": the degree of goal complexity, environment complexity, adaptability and independence.
TinyGSM: achieving >80% on GSM8k with small language models
- First student LLM to learn the Teacher LLM model ( GPT-3.5) performance in mathematical reasoning using synthetic data from the teacher model.
- TinyGSM: Two 1.3B LLNs with a 1.3B verifier LLM achieves SOTA level 81.5% accuracy on GSM8k, which consists of a high-quality dataset TinyGSM and use of verifier selecting final answer from multiple output generations.
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent
- Planner-Reasoner-Executor-Reflector (PRER) / MathAgent: Planner, Reasoner, Executor and Reflector.
- Systematic process for solving zero-shot mathematical reasoning with LLM agents.
Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory
- Self-Representation with Lamb: Uses semantic label to set tone for the conversation.
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
- LiFT: Outperforms significantly VPT/other models in MineDojo-ennvironment.
- LLM provides task instruction.
- VLM is sed to learn policy and act as a reward model.
LLMind: Orchestrating AI and IoT with LLMs for Complex Task Execution
- LLMind: Includes coordinator updating short-term memory/retrieving required AI (IoT) modules with ability to define, if script exists for the module and enerates it, if missing. Coordinator retrieves error / output messages from the executed script, which is handled by the script executor.
Holodeck: Language Guided Generation of 3D Embodied AI Environments
- HoloDeck: Generating 3d embodied environments with LLM: FLoor-wall module, doorway-window module, object selection module and layout design module.
- Personalized Path Recourse (PPR): Personalized path of actions to achieve a certain goal with an agent.
Adaptive parameter sharing for multi-agent reinforcement learning
- AdaPS: Maps agents to different regions of brain/shared network based on identity vectors obtained with VAE and clusters agents to K classes.
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
- RL agent using LLM to act as a Reward designer, Reward critic and a Trajectory designer.
Vision-Language Models as a Source of Rewards
- VLMs work as reward models and larger scale improves performance of the reward model.
Learning Coalition Structures with Games
- Coalition Structure Learning (CSL): Learns coalitions of agents via set of games.
13rd of December 2025
KVDirect: Distributed Disaggregated LLM Inference
- KVDirect: Framework optimizes KV cache transfer to enable distributed disaggregated LLM inference.
- Tensor-centric communication mechanism, custom communication library, dynamic GPU resource scheduling, pull-based KV cache transfer strategy, reduces synchronization overhead.
- KVDirect reduces per-request latency and improves resource utilization in disaggregated LLM inference.
12th of December 2023
- Medprompt+ extends Medprompt-method improved by asking additionally if scrapt-pad is needed and increasing number of ensembled calls from 5 to 20.
diff History for Long-Context Language Agents
- Compresses consecutive text observations from environment with Unix "diff"-command, which leads to 700% improvement in game score, outperforming existing agents by 40%, which use visual observations.
- Similar approach may enable building vastly more generic embodied LLM agents.
Sequential Planning in Large Partially Observable Environments guided by LLMs
- Neoplanner: builds state space model of the environment by testing different actions, observations and rewards. Builds a graph memory of learnings from all previous trials using Learner agent.
- Model provides anytime best policy given the knowledge at that moment. Balances exploration and exploitation.
11th of December 2023
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
- ReSTEM (Expectation-Maximization): LLM generates samples (E-step/Expectation-step) using temperature sampling, filter samples using binary feedback/reward, fine-tune LLM using these feedbacks (M-step/Maximization-step). Repeat few rounds. Improves significantly coding and math benchmark results.
- Ability to generate multiple correct solutions compared against human-generated data.
- ReSTEM uses temperature sampling (diverse/creative), compared to STaR-method based on greedy sampling (most-likely), where the rationalization-process leads to false-positive solutions.
8th of December 2023
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
- KwaiAgents, an autonomous agent loop including three key components: (KAgentSyst), LLMs (KAgentLLMs) and Benchmarks (KAgentsBench).
- System includes: Memorybank (Knowledge, Conversation and Task), Tool-library (Factuality-aware, Time-aware and Custom tools) used with Memory update, Task plan, Tool execution and Finish & Conclude-steps.
- LLM-component includes templates for LLs, Meta-Agent Tuning (MAT)-framework and LLM services. Benchmarks include both human and LLM-driven profiling.
- MAT includes six key components to generate prompt templates: system profile, instructions/constraints, tool specification, goal placement, memory allocation and output format.
7th of December 2023
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
- Creates answer in two steps: Starts by creating pseudo-code to solve the question, then runs the pseudo-code in code interpreter or LM emulating code, in case no code interpreter is available.
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making
- Autonomous Visualization Agents (AVAs): User instructions are converted with Visualization agent into actions and the taken actions are converted back to language within visualization tasks.
- Components include: Visual perception, Action planning and Memory components, working within visualization-perception-action-loop.
Generating Illustrated Instructions
- StackedDiffusion: Generates illustrated instructions based on text, which helps to train SOTA level multi modal models preferred over human generated articles.
- Introduces "Attention Buckets", which enable a 7B open source model to acchieve GPT-4 level tool use performance by compensating attention peaks between parallel processes in specific context.
6th of December 2023
- Concordia-library: Simulation environment made of multiple agents and Grand Master (GM) inspired by the Dungeons and Dragons game.
- Agents consume observations and GM agent actions. Agent produces actions and GM event statements (such as physical grounding).
- Includes long and short term memory, which include state of the world.
LLM as OS (llmao), Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem
- AIOS-Agent Ecosystem: Envisions LLMs as OS, Agents as Applications, Natural Language as Programming language and Tools as Devices/Libraries.
5th of December 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
- Answers visual questions by creating programs, that can review the image such as count number of specific types of objects and use tools.
- Answer is provided with CoT reasoning based on filtered program from many programs executed.
Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Constructio
- Uses three LLM agents for entity, event and relation extraction to build knowledge graph.
Large Knowledge Model: Perspectives and Challenges
- Large Knowledge Models: Reviews combination of LLMs (neural representation) and Knowledge graphs (symbolic representation) through usage of knowledge graph embeddings and text embeddings with LLMs.
4th of December 2023
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
- Exchange-of-Thought (EoT): Improvement from CoT and Self-Consistency, where thoughts from other LLMs are considered, outperforming in mathematical reasoning the CoT with Self-Consistency
- Proposes four communication paradigms to define the setup of the Exchange-of-Thought: Memory, Report, Relay and Debate.
- For example in Debate-mode: two LLM agents produce first ansswer the question and the two rationalizations are provided to the third LLM agent in order to debate these solutions in order to provide the right answer.
LLM A*: Human in the Loop Large Language Models Enabled A* Search for Robotics
- LLM A*: Includes current node, goal node, optical action and these three make up the plan.
- The chat-environment with user defines user inputs: Setting up environment, Setting up Action model, Start and Target Nodes, Heuristic and Rules.
- Demonstrates the possibility of achieving very good path planning results using mobile embodied agents.
Towards Learning a Generalist Model for Embodied Navigation
- NaviLLM: Embodied navigation with LLMs using schema-based instruction (task, history, observation and output hint), which generalizes well to unseen navigation tasks.
- Uses the following Multi-task learning modules: Visual-Language Navigation, Object localization, Trajectory Summarization and 3D Queestion Summarization.
OpenVoice: Versatile Instant Voice Cloning
- OpenVoice: Voice cloning almost from instant voice record.
29th of November 2023
Universal Self-Consistency for Large Language Model Generation
- Universal Self-Consistency (USC): Uses LLMs to select the most consistent answer among multiple candidates working in mathematical reasoning and code generation and unlike the original Self-Consistency, the method works in open-ended questions.
- This can be used as a more capabale component in the STaR-method, which generalizes with Q&A with open-ended answers, not only precise answers.
28th of November 2023
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
- Medprompt: Generalist LLM using MedPrompt outperforms SOTA specialist model.
- Uses SOTA prompt method: CoT, Choice Shuffle and Self-Consistency prompting
- Introduces Choice Shuffle-technique, which inreases diversity of the reasoning paths.
27th of November 2023
Some intuitions about large language models
- Jason Wei Blog post / Presentation.
- Learning the relationship from Input to Output is as well Next-word prediction learning.
- Next-word prediction is massively multi-task learning.
22th of November 2023
- Identifies two types of LLM agents: "Agents-as-workers" and "Agents-as-coordinators".
21st of November 2023
System 2 Attention (is something you might need too)
- System 2 Attention (S2A): Generate interim user question and interim context from the original user input. Finally, generate the final answer by answering to the interim user question from the interim context.
- Reduces hallucination from irrelevant context by first defining the question and the context and this way separating irrelevant facts from impacting the response generation.
20th of November 2023
- Systematic review of research from Chain-of-Thought (CoT) to LLM Agents and identifies gaps in generalization, redundant interactions and customization and more.
17th of November 2023
A Language Agent for Autonomous Driving
- Agent-Driver: Uses LLM agent for human-like intelligence for autonomous driving.
- Tool library provides input for: detection, prediction, occupancy and mapping functions. Memory includes commonsense memory and Experience memory. There is apart historical trajectories and ego-states.
- The reasoning engine includes: CoT reasoning, Task planning, Motion planning and Self-Reflection. These lead to actions and again to environment update.
16th of November 2023
Digital Socrates: Evaluating LLMs through explanation critiques
- Digital Socrates: evaluates reasoning flaws: giving feedback on why and where?
15th of November 2023
Divergences between Language Models and Human Brains
- Reviews differences measured with MEG in human brain vs. language models.
- The study reveeals, that LLMs are less good at social/emotional intelligence and physical commonsense reasoning.
- Finetuning helps to align LLMs to act more in human brain-like manner.
AutoMix: Automatically Mixing Language Models
- AutoMix: Use a smaller LLM to generate initial response and uses Meta-Verifier to check the trustworthy in rough scale. If the answer is trustworthy then use the small LLM answer, otherwise consult a larger LLM.
- Uses Incremental Benefit Per Unit Cost (IBC) metric to asses effectiveness of this approach.
14th of November 2023
DeepThought: An Architecture for Autonomous Self-motivated Systems
- DeepThought: An architecture for cognitive language agents posing agency, self-motivation, and partly meta-cognition.
- Includes supervisor module, Deep Reinforcement Learning module, Attention Schema (long-term memory), Language/Auditory/Vision modules and Embedding store.
9th of November 2023
LLM Augmented Hierarchical Agents
- Hierchical agent uses LLM to evaluate, when to use specific skill to complete specific sub-level task with long horizon.
- The resulting model works without the need for a LLM after the training.
Prompt Engineering a Prompt Engineer
- Guide LLM to prompt engineer prompts automatically
- The metaprompt uses: prompt engineering tutorial, two-step task description, step-by-step reasoning template and context specification.
8th of November 2023
ADaPT: As-Needed Decomposition and Planning with Language Models
- ADaPT: Plans and decomposes dynamically complex tasks with LLMs, if the executor is not able to complete the task.
2nd of November 2023
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
- RoboGen: Agent using LLMs to define new tasks to learn, create their simulation environments, train on them to acquire diverse & new skills.
- Agent includes: Task proposal, Scene generation, Training Supervision Generation & Skill learning.
Youtube. Adam Kalai presents "Recursive Self-improving Code Generation - talk 2.11.2023
- Adam Kalai talk on the "Self-Taught Optimizers (STOP): Recursively Self-Improving code generation", which is in essence attempts to build code for letting LLMs themselves improve (their) own code.
- I recommend to check this especially from safety-aspects on the point "sandbox-flag" and to better understand the
1st of November 2023
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
- Introduces plug-and-play dialogue policy planner(PPDPP).
- Dialogues plans using Self-play with three LLM agents: one acting to achieve a goal like buying a product at cheaper price, second to negotiate as seller a higher price and a third LLM scoring performance as reward model.
SAGE: Smart home Agent with Grounded Execution
- SAGE (Smart home Agent with Grounded Execution).
- Device interaction: Interaction planner, Attribute retriever, API documentation retriever, Device disambiguity, Device command execution.
- Personalization: Long-term memory, User profile & Personalization tool.
- Includes Physical grounding such as light bulbs and External grounding (such as weather forecast) & Personalization.
Efficient Human-AI Coordination via Preparatory Language-based Convention
- HAPLAN: Human-AI coordination using Conventions. Humans communicate roles & tasksof individuals before starting a task to be completed. Humans create Conventions.
- Builds a Convention (an action-plan) to guide AI/human using task requirements, human preferences, number of agents and other information for a better understanding of tasks & responsibilities of each agent/human.
- Assigns sub-problems to own sessions. Convention is first confirmed with human.
31st of October 2023
Generating Sequences by Learning to Self-Correct
- Self-Correction: A generative LLM, which includes two modules: Generator and Corrector.
Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback
- Autonomously explores real world
- Guided Expliration for Autonomous Reinforcement learning (GEAR): approaches objective by meeting promising sub-goal close to final target (Goal Selector), but reachable from current position using current policy (Density model).
- Crowdsourced & Occasional comparative feedback regards user objective vs. available correct/incorrect states.
Towards A Natural Language Interface for Flexible Multi-Agent Task Assignment
- Programs constraints into task assignments system based on natural language using Multi-agent LLMs.
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
- DEEP: Uses agressive (truthfull) & conservative modes (to disguise) to play spy game to asses intelligence of LLMs to describe target word without stating explicitly the word.
Multi-Agent Consensus Seeking via Large Language Models
- Consensus within multi-agent reason mainly reason and change their numerical value state based on consensus strategy based on average strategy.
26th of October 2023
CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
- Studies competition of LLM agents and identifies research on competition of LLM agents, as important as co-operation.
- The initial advantage of a LLM agent leads to feedback creating cycle for Matthew's effect.
- LLM Agents can operate in competitive environment.
- LLM Agents learn to imitate and differentiate with other LLM agents.
25th of October 2023
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
- PromptAgent: Optimizes prompts using planning algorithms such as MCTS.
- Creates intermediate prompts, updates them based on error feedback, simulates future rewards and searches higher reward paths.
- Prompts generated include: Domain knowledge, Task description, Term clarification, Solution Guidance,Exception handling, Priority & Emphasis, Formatting
24th of October 2023
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
- Key-value store for observation retrieval, parsed actions are executed by RCAgent or by Expert Agent.
Diverse Conventions for Human-AI Collaboration
- Mixed-play: generates diverse conventions (arbitrary solutions to reocurring cooperation problems) by randomly switching between self-play (maximize award) and cross-play (Minimize) actions to maxime mixed-play.
- CoMeDi (Cross-play optimized, Mixed-play enforced Diversity) algorithm is explained .
Woodpecker: Hallucination Correction for Multimodal Large Language Models
- Woodpecker: To extract key concepts, formulate questions and validate visual knowledge and generate visual claims using Multimodal Large Language Models (MLLMs) to control hallucinations in LLM responses.
In-Context Learning Creates Task Vectors
- Training data used with LLMs is compressed into task vectors within LLM. Task vectors are used in 18 tasks.
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
- On Demand Information Extraction (ODIE): Extracting information using LLMs from text to present it in structured tabular format.
23th of October 2023
Function Vectors in Large Language Models
- LLMs include Function Vectors (FCs) to trigger functions in different contexts.
LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay
- Explores social behaviour or LLMs in Avalon-game regards team working and other collaboration.
20th of October 2023
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
- ToolChain*: Uses A ∗ search algorithm to navigate an action space as a tree-like structure with LLM agent.
- Selects most promising path, Expand follow up actions in the selected path, Update the tree-structure.
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
- Student LM takes an “exam” to gather mistakes it made. Teacher LM generates training data based on the mistakes. Teacher LM customizes each "exam" the feedback. Student LM learns to improve with self-reflection on its mistakes made and the new training data provided by the teacher LM. These steps are repeated until Student LM has reacher Teacher LM capability.
19th of October 2023
AgentTuning: Enabling Generalized Agent Abilities for LLMs
- AgentTuning: Improves LLM capability by Instruction Tuning to user tasks by using AgentInstruct-dataset to create AgentLM using AgentTuning.
18th of October 2023
Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale
- Language agent to automatically identify ans quantify extent of generated images.
- Planning and Reasoning. Tool usage: Intent understanding, Instruction generation, Instruction retrieval, Prompt optimization & Stereotype score generation.
17th of October 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
- Set-of-Mark (SoM)-visual prompting technique to answer questions by partioning image into regions with different level of granularity and insert numbers for each region.
- Studies VLM model prompting techniques.
The next grand challenge for AI
- Foundational Agent: Agents, which scale in all three axis of: skills, embodiment and realities. If chatgpt was scaled with data, foundational agents are scaled with realities.
16th of October 2023
Character-LLM: A Trainable Agent for Role-Playing
- Character-LLM: simulates historical figures using LLMs, which mimick profile / experiences and emotional states of specific individuals.
- Applies "Experience Reconstruction" with detailed experiences and memories.
- Specialises a base model for character generation.
- Evaluates using step-by-step LLM-judge aproach by evaluating one dimension at each step.
OpenAgents: An Open Platform for Language Agents in the Wild
- OpenAgents-platform: Data agent, Plugin/Tools and Web agent
- Automatic tool selection from over 200 tools
Improving Large Language Model Fine-tuning for Solving Math Problems
- Introduces multi-task sequential fine-tuning method, where solution generation is improved by including solution evaluation as part of the fine-tuning objective together with the generated solution to provide higher-quality guidance to solution generator.
- Quality and style of the step-by-step solutions used for fine-tuning impact model performance. Solution re-ranking and Majority voting used together are effective way to improve model performance with fine-tuning.
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
- A Continually Learning Generative Agent from Interactions (CLIN): Memory generator updates memory, Controller manages tasks and Executor converts it into actions towards the goal.
Theory of Mind for Multi-Agent Collaboration via Large Language Models
- LLM-based agent manages complex multi-agent collaboration task with performance level comparable with RL agent.
13th of October 2023
A Zero-Shot Language Agent for Computer Control with Structured Reflection
- Zero-shot agent plans executable actions in the environment and iteratively progresses by learning from mistakes using self-reflection and structured thoughts management.
- Better generalization, outperforms best iterative-planning agents
12th of October 2023
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems
- AgentCF: LLM agent-based recommender system with Use and Item Agents.
- User & Item Agents interact autonomously and the discrepancies between the two are stored in the memory to help guide better future recommendations.
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
- Octopus: Uses Vision-Language Model with Reinforcement Learning from Environmental Feedback (RLEF).
- Generates action sequences and executable code.
MemGPT: Towards LLMs as Operating Systems
- MemGPT: OS-based design with LLM-processor managing its actual context and long term memory and uses functions to make changes and events to manage order of processing data.
- Promptor: Automatic prompt generation.
- Builds prompts based on: User goals, User Profiles, Data Profile, Contextual nformation & Output constraints
- System prompt includes: instructions, Actions, Facts and Examples.
Towards Robust Multi-Modal Reasoning via Model Selection
- Dynamic model selection by taking into account input & sub-task dependencies.
11th of October 2023
- Evidence about strong correlation between layers activated in Deep Language Models (DLMs) and human brain high-order language areas: auditory,syntactic and semantic areas.
- Brain and DLMs both process input into multi dimensional vector embeddings, processed as sequences taking into account the context.
- Identifies differences. One difference is, that human brain does not perform straightforward linear interpolation between the previous and current words, suggesting RNNs may better mimick human brain language processing. The other difference is, that humans do not learn only by reading text, but use data from multiple modalities.
- Diagnosis-of-Thought: Cognitive distortion detection through prompting: Subjective assessment, contrastive reasoning and schema analysis.
LangNav: Language as a Perceptual Representation for Navigation
- Uses BLIP to make imgae caption and DETR for object detection on image views to to obtain text descriptions, which a LLM agent uses to generate navigation instruction.
10th of October 2023
Towards Mitigating Hallucination in Large Language Models via Self-Reflection
- Self-Reflection: Introduces self-reflection prompting, similar to "Reflection"-prompting. Evaluates via LLM-loom, if the answer knowledge is factual enough and in second loop, if the answer is enough consistent.
- Human reviewers are asked to evaluate sentence in answer in case is generic, fact-inconsistent or fact-consistent. The user is as well asked to categorise answer to be question-inconsistent(inconsistent), tangential (consistent, but not on topic) or answerable (consistent and answers).
9th of October 2023
FireAct: Toward Language Agent Fine-tuning
- Fine-tuning LLMs with agent trajectories for better autonomous agents.
8th of October 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
- MemWalker: navigates long-context iteratively and construct memory as treelike structure.
7th of October 2023
Crystal: Introspective Reasoners Reinforced with Self-Feedback
- Introspective reasoning of the knowledge.
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
- PathCrawling: Crawl all paths leading to reward (train LLM with these paths) and Evaluate generality to unseen task. Continue crwaling most general paths.
6th of October 2023
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
- Language Agents Tree Search (LATS): Self-Refine, Memory, Reasoning, Decision Making & Planning.
- Uses multiple reasonining paths and learns from experience by integrating external feedback & self-reflection.
BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity
- BrainScuba (Semantic Captioning Using Brain Alignments): LLM generates interpretable captions.
- Aligns brain activity pattern with semantic content to generate captions to explain how brain processes visual information.
- Collects brain imaging data fMRI when human views visual stimuli and uses BERT to obtain semantic reprensentation in natural language, which is based on alignment process. This process maps images to voxel-wise brain activations.
5th of October 2023
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
- AgentInstruct: generates instructions for th problem and then solves it using these instructions, improving the Chain of Thought (CoT) zero-shot reasoning.
5th of October 2023
- Characteristics of Autonomous Agents: Goal-driven task management, Intelligent Agents with LLMs, Multi-Agents collaboration, Context interaction, Balancing Autonomy vs. Alignment.
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
- DSPy programs (think Langchain as cmparison) help create LLM pipelines, which can outperform few-shot prompting techniques.
- Help improve mathe world problems or answering complex questions and manage chaining / loops.
3rd of October 2023
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
- Self-Taught Optimizer (STOP): Ask LLM to improve initial program by providing improvement candidates and then output best solution.
Lyfe Agents: Generative agents for low-cost real-time social interactions
- LyfeAgents Brain: Sensory processing, Internal states, Self-monitor, Action selection and Memory.
- Internal states are text based: current goal, memory, recent events and sensory inputs.
- Cognitive controller selects high-level actions. Action model selects actions until termination condition is reached.
- Self-monitoring maintains and emphasizes recent and novel events towards agent goals
- Memories are clustered and summarized before moving them to long-term storage (vector database)
EcoAssistant: Using LLM Assistant More Affordably and Accurately
- EcoAssistant: Enables LLM agent to converse with code executor to iteratively produce answers based on code produced. Hierachical structure, where cheaper and weaker LLM is used before trying the stronger and expensive LLM.
- Surpasses GPT-4 10% in performance with 50% less cost.
Large Language Models as Analogical Reasoners
- LLM self-generates examples/knowledge related to the task.
Conceptual Framework for Autonomous Cognitive Entities
- Conceptual framework for Autonomous entities.
OceanGPT: A Large Language Model for Ocean Science Tasks
- DoInstruct (Domain Instruction): Automatically gathers large amount of domain specific instruction data for multi-agent collaboration.
- Domain Instruction generation: Agents used as experts in each topic. Instructions are augmented rapidly through agent collaboration, which are annotated and finally inspected for high quality fine-tuning dataset.
2nd of October 2023
Enabling Language Models to Implicitly Learn Self-Improvement
- ImPlicit Self-ImprovemenT (PIT)-framework: introduces self-improvement, where LLMs self-improve its response quality with human preference data without extensive human annotation.
SmartPlay : A Benchmark for LLMs as Intelligent Agents
- SmartPlay: a benchmark to test LLM-based agents from 9 perspectives.
- Tests: Reasonning with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness.
GRID: A Platform for General Robot Intelligence Development
- GRID: General Robot Intelligence Development
- Solves complex tasks using simulatiom and/or real-world data
- Task specification, robot configuration and sensor/API.
- Foundation Mosaic: a neural architecture.
1st of October 2023
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
- RoleLLM: Role-profile constructor, Context-based Instruction generarion, Role-based Prompting(RoleGPT), Role-conditioned Instruction-tuning.
29th of September 2023
AutoAgents: A Framework for Automatic Agent Generation
- AutoAgents: Planner agent receives user input and converts it into a plan. Multiple agent roles take actions in this plan to convert into a result.
- Observers: Observer agent reviews, if the created agent roles meet the requirements. Plan observer agent reviews, if the plan meets expectations. Action observer reviews, if the action response meets expectations.
- Includes drafting stage (with agent observer and plan observer agents) and Execution stage (with action observer).
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
- Motif: Trains a reward fucntion/model from pairs of gameplay captions and LLM observations of these game actions. Then train an agent using RL with the reward model.
- Diverse behaviours triggered with the LLM improve in performance in specific domain: for example Gold Collector collects more cold.
28th of September 2023
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
- Promptbreeder uses thinking styles and mutation-prompts and is able to improve mutation/task prompts.
24th of September 2023
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning
- Heuristic Greedy Search for Process-Supervised Reward Model (HGS-PRM): each new reasoning step generated by the LLM is evaluated by the reward model, if to accept the reasoning step or generate a new one until the reasoning path is identified.
- Creates PRM-Code dataset using Code-LLaMA-7B using Mutating testing-technique.
23th of September 2023
Natural Language based Context Modeling and Reasoning with LLMs: A Tutorial
- LLM-driven Context-aware Computing (LCaC) approach.
20th of September 2023
You only look at the screens: Multimodal Chain-of-Action Agents
- Multimodal Chain-of-Actions Agents (Auto-UI) interacts directly with the UI
- Chain-ofAction technique using series of action histories and future action plans.
18th of September 2023
MindAgent: Emergent Gaming Interaction
- MindAgent: Planning skills and Tools use(Agent location, Tool state, Agent holdings, Pending dishes, Timer), LLM dispatcher, Memory history (Environment, Agent State, Actions and Feedback) and Action module(Controller, Human actions, Action validator, Action Types/Patterns/Names).
- Introduces CuisineWorld-benchmark, where multiple agents play game simultaneously through multi-agent collaboration.
14th of September 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
- A conceptual framework for LLM-based agents with three components brain, perception, and action.
Agents: An Open-source Framework for Autonomous Language Agents
- Multi-agent: Planning, memory, tool usage, multi-agent communication & symbolic control.
- Open source library.
13th of September 2023
Physically Grounded Vision-Language Models for Robotic Manipulation
- PhysObjects dataset for physical grounding.
- VLMs with PhysObjects improves its understanding on physical objects.
- Improves task success rate.
12th of September 2023
Life-inspired Interoceptive Artificial Intelligence for Autonomous and Adaptive Agents
- Interoceptive AI: monitoring own internal state of the artificial agent.
- Sebastien Bubeck explains the insights from the reserch on Phi-1 regards coding tasks and Phi-1.5. regards reasoning tasks and the models being able to outperform 1000 times larger LLMs.
- The talk highlights, that the key ingredients on Textbook-like training data and then giving then giving Exercises.
- Explains the the key ingredient in "Textbooks are all you need"-paper regards the data, is largerly based on TinyStories-paper, which dataset was used to train a high performing model to generate fluent and consistent stories in English language.
8th of September 2023
Unleashing the Power of Graph Learning through LLM-based Autonomous Agents
- AutoGraph procedure: data, configuration, searching and tuning agents.
28th of August 2023
RecMind: Large Language Model Powered Agent For Recommendation
- RecMind: a recommender focused LLm agent with reasoning, planning to sub-tasks, memory & tools.
22th of August 2023
A Survey on Large Language Model based Autonomous Agents
- Systematic review of LLM based Autonomous Agents.
- Use cases and evaluation strategies and future use cases.
21st of August 2023
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
- AgentVerse: multi-agent collaborarion and individual agents social bjeaviours.
18th of August 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
- Graph-of-Thoughts (GoT): Reasoning with LLM using graph-structure with intermediate steps.
- Introduces Volume-of-Tought metric to inform the scope of information carried by the LLM output.
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- AutoGen: An open source framework, where LLM agents converse with other LLM agents either one or many, chat with humans and use tools.
- LLM agents are able to create new chats with other LLM agents.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
- Improves math reasoning with Reinforcement Learning from Evol-Instruct Feedback (RLEIF): Upward and Downward evolution improve instructions by making questions easier or harder based on their difficulty level.
17th of August 2023
Reinforced Self-Training (ReST) for Language Modeling
- Introduces Reinforced Self-Training (ReST).
- Grow step generates data from LLM, Improve step uses this filtered data to fine-tune the LLM. Repeat.
Never-ending Learning of User Interfaces
- Never-ending UI Learner: automatically installs apps from an appstore and crawls them to learn difficult training examples
3rd of August 2023
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
- Proposes Rejection sampling Fine-Tuning (RFT), which generates reasoning and collects correct ones to augment as fine-tuning dataset.
25th of July 2023
WebArena: A Realistic Web Environment for Building Autonomous Agents
- An environment to test Autonomous agents in an environment with tools, external knowledge.
20th of July 2023
- Addresses LLM training data to be "text-book-like": clear, self-contained, instructive, and balanced. The method is used in Phi-models.
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
- BuboGPT: Uses Vicuna LLM by receiving text input inserting together visual and audio inputs separately with Q-former. The Vicuna output is then processed using SAM-model for visual grounding.
- Achieves coherent and grounded descriptions
16th of July 2023
Communicative Agents for Software Development
- ChatDev: Define task and automatically generate SW designing, coding, testing, and documentation using "Chat Chains", where LLM-based chats include different roles for each sub-task: CEO, programmer, CTO etc.
- Includes role-assignment, memory and self-reflection.
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
- Protein Language Model: xTrimoPGLM.
14th of July 2023
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
- EmotionPrompt: adds to prompt an emotional stimuli, which improves performance by 10.9%.
- An example of an emotional stimuli is to state that the work is important for career.
23rd of June 2023
- Lilian Weng from OpenAI article / blog post
- Covers Planning, Memory and Tool usage of LLM powevered agents
8th June 2023
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
- Builds multi-agent simulation environment to generate dataset of using many real world apis.
- Small models can achieve comparable performance to larger models on tool usage.
6th of June 2023
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach
- When2Ask: RL agent, which learns when to query LLM for high-level plans to complete a task.
- Planner, Actor and Mediator.
5th June 2023
SELFEVOLVE: A Code Evolution Framework via Large Language Models
- Generates intermediate code based on input prompt.
- Use LLM to act as expert programmer to debug the generated code by receiving errors from Python interpreter.
3th June 2023
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services
- Human AI collaborative intelligence methodology & technical practices, where the idea is not to have "full Auto-GPT" from user input to direct resolution by LLM, but rather human reviews steps between.
- Useer inputs objective, LLM asks clarification. Use then User adds clarifications and LLM constructs AI chain for human to review. Finally LLM executes the AI chain with user acceptabnce tests.
3th June 2023
Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions
- Auto-GPTs outperforms supervised state-of-the-art Imitiation Learning (IL) models with GPT4 in WebShop- and ALFWorld-benchmarks in unknown external environments.
- Additional opinions algorithm improves performance, which takes into account additional opinions from external expert models.
2nd of June 2023
- MathChat: Describes a solid conversational MATH problem solving in four step process.
- Describes the prompts used.
26th of May 2023
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
- Graph-of-Thought (GoT) reasoning: To model human thought process as graph instead of chain to improve LLM reasoning capability.
- Uses low-quality LM to generate High-quality dataset (more diverse and more effective for generalization in unseen domains) to train a high quality model: 770 million parameter model outperforms GPT-3 in multiple tasks evaluated by humans.
25th of May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
- Voyager: open-ended embodied agent with LLM
24th May 2023
Reasoning with Language Model is Planning with World Model
- RAP (Reasoning via Planning): Uses LLM as both world model and reasoning LLM-agent. Integrates MCTS search planning algorithm.
- Incrementally generates reasoning tree with LLM in domains of plan generation, math reasoning and logical inference.
Gorilla: Large Language Model Connected with Massive APIs
- Gorilla is a retrieve-aware finetuned LLaMA-7B model for API calls using self-instruct to generate Instruction-API pairs.
Better speech synthesis through scaling
- TorToise (TorToise an expressive, multi-voice text-to-speech system): introduces text-to-speech synthesis framework utilizing autoregressive transformer and diffusion decoder with conditioning inputs and CLVP re-ranking for improved speech quality.
- This framework comprises autoregressive transformer for speech token prediction, diffusion decoder for converting tokens to MEL spectrograms, and vocoder for waveform generation from spectrograms.
- TorToise incorporates conditioning MEL from reference audio and CLVP discriminator to enhance speech synthesis expressiveness and enable speaker cloning capabilities.
18th of May 2023
Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation
- Brainstorm: uses brainstorming step to generate and select diverse thoughts in code generation.
- Uses three steps: brainstorming, thought selection (trains a thought ranker for this) and writing code.
17th May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Tree of Thoughts (ToT)-technique makes decisions using multiple different reasoning paths, self-evaluating choices to decide next action with ability to look back/forward for global decisions.
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
13th of May 2023
- BabyCatAGI: a modified BabyAGI by replacing task manager in BabyBeeAGI with task creation agent running once.
- Uses Intelligent Agent Tool to combines tools to extract only relevant information to next step such as looping web search and scraping results to pull only specific part to another task.
12th of May 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
- A breakthrough paper, where synthetic data generated by Teacher-Student LLM is used to train a high-performing model to generate fluent and consistent English stories.
- Demonstrated the effectiveness of synthetic data in smaller LLMs challenging large SOTA models in domain of English language.
- Uses GPT-4 to grade content generated by the models as if created by student and being graded by the GPT-4 teacher.
9th of May 2023
ImageBind: One Embedding Space To Bind Them All
- ImageBind: a joint embedding space for images, text, audio, depth, thermal and IMU data modalities-
3rd of May 2023
Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
- Introduces Visual Chain of Thought (VCoT) for data augmentation, where between reasoning steps multimodal data is infilled to obtain better reasoning results.
30th of April 2023
BabyBeeAGI: Task Management and Functionality Expansion on top of BabyAGI
- BabyBeeAGI: a modified from BabyAGI tracking statuses of tasks, task dependencies, identification of required new tasks, assigning tools and results in json-format.
26 of April 2023
["Inside OpenAI Entire Talk" by Stanford eCorner
- Interview of Ilya Sustskever, where defined a way to perform "a consciousness test" from a very controlled dataset, see "minute 15".
21st of April 2023
- LLM agent self-help with LLM to complete IGLU tasks using clarifying questions.
13th of April 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- RAFT-finetuning: Samples batch lf data from LLM, reward function scores them, high reward examples are filtered as data to finetune the LLM.
11th of April 2023
ChemCrow: Augmenting large-language models with chemistry tools
- Uses LLM and chemistry tools to plan and execute different chemical tasks.
- Tools include web and literature search, Python, human-tool to interact with the end user and various molecule tools, safety tools and chemical reaction tools.
Teaching Large Language Models to Self-Debug
- The model generates new code together with code explanation. The code is then executed and this executed code is sent back as feedback together with the code explanation. This feedback
7th of April 2023
ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions
- ChatPipe - Iterative, data preparation program with ChatGPT using 1. Operation Recommendation, 2. Program generation, 3. Version management.
- Recommends next data preparation opration. Easily roll-back to previous program for version control.
6th April 2023
Generative Agents: Interactive Simulacra of Human Behavior
- Enable believable human behavior: observation, planning, and reflection.
- An agent wants to throw a Valentine’s Day party. The agents autonomously spread invitations, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time.
- GPTeam is inspired by this approach.
31 March 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
- CAMEL attempts to facilitate autonomous cooperation among communicative agents through role-playing framework.
- The approach manages complete tasks with minimal human input.
30th of March 2023
Self-Refine: Iterative Refinement with Self-Feedback
- Self-Refine refers to Iterative refinement with self-feedback: use the LLM to get Feedback to original output, which is passed back to LLM to Refine a new output.
- The concept is best understood here in the blog by : Self-Refine: Iterative Refinement with Self-Feedback with GIFs and code examples.
- Improves base-model performance in tasks like math reasoning and code generation.
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
- A LLM (such as ChatGPT) accesses HuggingFace community to look AI models to complete the given task.
- It can read multi modalities by outsourcing tasks like image recognition to the specific image model.
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
- Dialog-Enabled Resolving Agents (DERA) uses two roles: Researcher and Decider to perform discussion between these two agents.
- Researcher role processes information and Decider role uses judgement.
29th of March 2023
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
- Multimodal conversational foundation model (MCFM). MCFM generates a textual solution outline, then API selector chooses most relevant API from collection of APIs (with API name, parameter list, description, usage example and example when combining it with another API).
- MCFM generates action code using recommended API and the API call is executed. Finally, output is provided back to developer.
28th March 2023
Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications
- Task-driven autonomous agent, with vector database and Langchain. BabyAGI includes: Execution, creation and prioritization
- Takes objective, pulls an item from task queue and moves it to execution agent with access to memory.
Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Raises an argument, that GPT-4 model capabilities should be reviewed as an early and incomplete version of Artificial General Intelligence (AGI) systems due the multiple metrics comparing against human level-performance.
- Raises the argument, that LLMs need to move beyond "next-word prediction" to overcome linear reasoning limitation, which often is possible to solve as incremental tasks with few iterations.
20th March 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
- Reflexion agents reflect on task feedback, use it from memory to make better decisions and new attempts.
Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference
- EcoOptiGen: Hyperparameter tuning of LLMs.
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback
27th of February 2023
Reward Design with Language Models
- LLM-RL: framework uses a LLM as a proxy reward function to train reinforcement learning (RL) agents.
- User specifies objective with natural language prompt, LLM evaluates agent's behavior, and framework is agnostic to RL algorithm.
- This approach simplifies reward design and enables training of agents aligned with user objectives.
Citation
How to cite my work?
@misc{MaattaAutonomousAgents2023,
author = {Teemu Maatta},
title = {Autonomous Agents},
year = {2023},
howpublished = {\url{https://github.com/tmgthb/Autonomous-Agents}},
note = {Accessed: YYYY-MM-DD}
}