Autonomous_Agents_Research_Papers_Earlier.md

May 18, 2026 · View on GitHub

Autonomous Agents

Autonomous Agents-research papers. Updated daily. Resources-section-section.

Research papers: 2022 and earlier

2026 (5/5), 2026 (4/5), 2026 (3/5), 2026 (2/5), 2026 (1/5), 2025 (4/4),2025 (3/4), 2025 (2/4), 2025 (1/4), 2024, 2023, Earlier

Chronological order.

8th of December 2022

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

LLM-Planner: Uses LLM for few-shot planning with embodied agents based on natural language and visual perception of the environment.
Improves planning with physical grounding to create and update plans.
Includes task introduction/goal instruction/step-by-step instructions/plan list//object list/retrieval message (next plan).

20th of October 2022

Large Language Models Can Self-Improve

Demonstrates LLM is able to Self-Improve with only unlabeled datasets using CoT and Self-Consistency Prompting and then fine-tune the LLM using these self-generated solutions as target outputs.
This research by Google, effectively performs Self-Recursive Learning not only during Inference time (such as CoT or In-Context Learning alone), but training as well.

12th October 2022

Interactive Language: Talking to Robots in Real Time

Interactive Language: introduces a framework for real-time language-instructable robots, with Teleoperated Data Collection, Hindsight Language Relabeling, Language Conditioned Behavioral Cloning (LCBC), Robot Policy, Real-time Language Guidance, ResNet CNN, CLIP Text Encoder, Vision-Language Transformer, Temporal Transformer, and Policy MLP.
Interactive Language framework uses behavioral cloning on large language-annotated dataset for training real-time language-guided robot policy.
This framework facilitates interactive robot control for complex manipulation tasks and demonstrates high success rate on diverse language commands.

31st of August 2022

Emergent Abilities of Large Language Models

Defines officially the term "Emergent Abilities": "An ability is emergent if it is not present in smaller models but is present in larger models."
Emergent abilities were detected already with GPT-3, but here its clearly defined as ability detected only after specific scale.
Identifies a list of Emerging abilities not detected in specific smaller model, but identfied in a larger model.
I like the paper, because increasing number of task patterns are learned using single learning objective of next-word prediction as scale increases.

12th of May 2022

A Generalist Agent

Gato: A multi-modal, multi-task, multi-embodiment generalist policy agent.
Learns to play Atari, caption images, chat, stack blocks with robot arm, etc.
Includes text tokens, image patch tokens, agent timesteps and action tokens.
Argues, that "a generalist agent that can adapt to new embodiments and learn new tasks with few data."

19th of April 2022

Deep learning, reinforcement learning, and world models

Reviews Deep learning, Reinforcement learning and World models.
Claims humans use World model as simulators in the brain, learned through senso-motory interaction with the environment. It is possible to learn world model using deep generative models.

28th of March 2022

STaR: Bootstrapping Reasoning With Reasoning

Introduces the concept: "Self-Taught Reasoner" (STaR) or *, where LLM improves its reasoning by learning from its own reasoning: model is asked to generate rationalizations to questions. If rationalization derives wrong answer to question, the rationalization is repeated by giving it as well the correct answer. All rationalizations leading to correct answer are used for fine-tuning the LLM model. This process is repeated and each iteration improves the LLMs capability of reasoning.
The paper does not refer to Self-Recursive Learning, but we could argue it as an example of this process in the context of reasoning.

21st of March 2022

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Enables reasoning with LLMs using CoT and Self-Consistency, where multiple, different reasoning paths are used to vote the most consistent answer.
Improves reasoning and math problem solving.

Chain of Hindsight Aligns Language Models with Feedback

Chain of Hindsight (CoH): Humans learn from feedback, which is converted sequences of sentences, ranked with human preferences and used to fine-tune the LLM.

7th of March 2022

Shared computational principles for language processing in humans and deep language models

Provides evidence about three computational principles, shared both by Deep Language Models (DLMs) and human brain to process language.
The three principles are: continuous next-word prediction, contextual embeddings and surprise prediction error.

28th of January 2022

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Defines Chain-of-Thought (CoT).
CoT is one Emerging Ability not present in smaller models, but present in larger models.
CoT can be seen as Self-Recursive Learning, where the LLM improves its own output by having LLM use intermediate steps to solve complex task.
The approach effectively demonstrates the LLMs capability to perform Self-Recursive Learning, altough its not integrated back as training data of the model.

12th April 2021

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

RAG (Retrieval-Augmented Generation): introduces retrieval-augmented generation models, with Query Encoder, Retriever, Document Index, and Generator, for knowledge-intensive NLP tasks.
RAG framework combines parametric memory (pre-trained seq2seq model) and non-parametric memory (Wikipedia index) to improve generation quality.
RAG models achieve state-of-the-art results on open domain question answering tasks, outperforming parametric and task-specific architectures.

26th of March 2021

Alignment of Language Agents

Defines Language Agent.

8th of February 2021

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

Q* search algorithm: Better version of A* search algoirthm, because reduces computation time and number of nodes to be computed.

28th of May 2020

Language Models are Few-Shot Learners

Applies first-time the term of LLMs ability to learn a task from contextual information: "In-Context Learning".
This ability is another example of Self-Recursive Learning, altough its not integrated back as training data of the model.
This paper as well identified the capability of LLMs to learn multiple tasks by having been only trained to predict the next word. See Jason Wei´s presentation included below, where he covers the "Massively Multi-task learning" of LLMs and I think it helps to gain better insight about LLMs, rather than thinking them as simply "statistical models".

22th of May 2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Defines Retrieval-Augmented Generation (RAGs).

12th of November 2020

Reward is enough

Reward is sufficient to drive intelligent behaviours instead of requiring special formulations.
Agents could learn to obtain various intelligent behaviours through trial and error experiences to maximize reward.
Sophisticated intelligence may emerge from simple objective, think what an animal is able to learn to do just by being in hungry.

24th of November 2019

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

MARL: Introduces Multi-Agent Reinforcement Learning (MARL).

28th of July 2005

The Emotion Machine. Draft.

Human mind consists according to Minsky, from Cloud of Resources turnable on/off.
Important theory, because LLM agents can construct such resources, observed in a human brain, altough years after this theory.

12th of August 1996

Is it an Agent, or Just a Program?: A Taxonomy for Autonomous Agents.

"Autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future."
Definition includes: 1. Operate within an environment, 2. Sense and Act, 3. Over time, 4. Control its own agenda (Autonomous).
Studies the multiple previous definitions of Agents / Autonomous Agents, although the perspective is +27 years ago and prior to LLMs.

Prediction and Adaptation in an Evolving Chaotic Environment

Defines the concept of "Predictive Agent" as adaptive predictors.

A Learning Algorithm that Mimics Human Learning

Reviews Artificial Agents learning like humans.

24th of November 1967

A formal Basis for the Heuristic Determination of Minimum Cost Paths

A* search algorithm.
Defines the A* search algorithm for the first time, widely used in RL as planning algorithm.

Citation

How to cite my work?

@misc{MaattaAutonomousAgents2023,
  author = {Teemu Maatta},
  title = {Autonomous Agents},
  year = {2023},
  howpublished = {\url{https://github.com/tmgthb/Autonomous-Agents}},
  note = {Accessed: YYYY-MM-DD}
}