MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

March 29, 2025 · View on GitHub

This repo provides the source code of our paper: MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents. [PDF][Twitter] If you discuss or use MLR-Copilot in your research, please cite us!

@misc{li2024mlrcopilotautonomousmachinelearning,
      title={MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents}, 
      author={Ruochen Li and Teerth Patel and Qingyun Wang and Xinya Du},
      year={2024},
      eprint={2408.14033},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2408.14033}, 
}

MLR-Copilot is a framework where LLMs mimic researchers’ thought processes, designed to enhance the productivity of machine learning research by automating the generation and implementation of research ideas.

It begins with a research paper, autonomously generating and validating these ideas, while incorporating human feedback to help reach executable research outcomes.

Framework Overview

MLR-Copilot operates in three integrated phases:

Research Idea Generation: LLM-powered agents generate research hypotheses and experimental plans based on existing research papers.
Experiment Implementation: Translates experimental plans into executable experiments using retrieved prototype code and models.
Implementation Execution: Runs the experiments with mechanisms for human feedback and iterative debugging.

Demo Recording

GUI Demo with Pre-defined Examples

https://github.com/user-attachments/assets/1259e2ad-efc8-4a3c-bd4d-d604c46ebd55

Quick Start

Open in Colab

Setup

Begin by cloning this repository.

LLM Configuration

Place the following in a .env file at the root of this project:
- CLAUDE_API_KEY
- OPENAI_API_KEY
Configure the Hugging Face Token as needed so that huggingface_hub.login() works if you intend to use Llama.

Local Version

Install requirements: pip install -r requirements.txt

Docker Version

Obtain the Docker image tortcode/nlp-coresearcher:
- Build: docker build . -t 'tortcode/nlp-coresearcher'
- Or pull from Docker Hub: docker pull 'tortcode/nlp-coresearcher'
Run bash container.sh to start the container.

Experimentation

Task Creation

Place the research idea in the file problems/<task_name>.
Run any preparation scripts as needed.
Place all starter code in the directory workspaces/<task_name>.

Task Execution

To run the agent with a specific task and LLM (Claude, GPT-4, or Llama), execute bash run_demo.sh <task_name> <llm_name>.
- You must have access to the Meta Llama 3.1 models in Hugging Face to run Llama.
To ignore error logging, redirect stderr to /dev/null: bash run_demo.sh <task_name> <llm_name> 2>/dev/null.

Task Logs

Full logs are under logs/<task_name>/<start_timestamp>/agent_log/full_log.jsonl.
Other logs are under logs/<task_name>/<start_timestamp>/env_log/.

GIFs

Figure 1: The autonomous machine learning research task. We take the research paper as input and output the research idea (i.e., research hypothesis and experiment plan) with execution results. Figure 2: Our MLR-Copilot Framework. LLM IdeaAgent (leftmost grey component) performs research idea generation, including hypothesis and experimental design (Stage 1). ExperimentAgent implements and executes the experiments. MLR-Copilot Process

License

MLR-Copilot incorporate some of the components from MLAgentBench, under the MIT License Prompt2Model, under the Apache License 2.0, where files and API calls have been modified.