AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems

May 12, 2026 · View on GitHub

AgentSlimming is a research framework for optimizing graph-structured LLM agent workflows. It searches over executable workflow graphs, evaluates them on task benchmarks, and then reduces cost with pruning, quantization, and optional finetuning.

The implementation is centered on GraphFlow: a directed acyclic graph of nodes connected by typed edges. Generated workflows are stored as compact Python artifacts under workspace/<dataset>/..., while common execution and node behavior live in src/core/.

Short notes:

This repository started from an AFlow-derived codebase and has since been substantially refactored.
The exploration style of the MCTS pipeline can be steered significantly by the optimizer prompt in src/prompts/.
Nodes support an optional count_towards_cost boolean (default True) that lets you exclude a node's LLM usage from the final cost summary; set it in the generated graph.py node constructor.
This repository went through a large refactor before open-sourcing. We cannot guarantee that every path has been extensively tested yet. If you hit issues, please open an issue or send a PR.

AgentSlimming framework overview

Terminology in this repository:

mcts: graph/prompt search over executable workflows.
prune, quantize, and finetune are inspired by neural-network optimization terminology, but here they operate on agent workflows rather than model weights.
prune: removes workflow nodes or paths to simplify the execution graph, rather than pruning neural network parameters.
quantize: replaces selected workflow nodes with a cheaper model to reduce execution cost, rather than quantizing numeric weights or activations.
finetune: runs another MCTS-style workflow optimization stage starting from quantized workflows, rather than updating a base model with gradient-based finetuning.

Supported Tasks

QA: DROP, HotpotQA, MusiqueAns
Math: GSM8K, MATH, AIME
Code generation: HumanEval, MBPP, LiveCode

Task registration is in src/catalog/datasets.py and benchmark implementations live under benchmarks/. The dataset catalog is the single source of truth for CLI support, evaluator selection, and dataset file resolution.

Quick Start

Create a Python environment and install dependencies:

python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt

Create a local model config:

cp config/config.example.yaml config/config.yaml

Fill in base_url, api_key, and token prices for the models you plan to use.
Run one pipeline:

python -m src.cli.run \
  --config config/config.yaml \
  --dataset MATH \
  --pipelines mcts \
  --workspace workspace \
  --max_rounds 10 \
  --sample 4 \
  --opt_model_name gpt-4.1 \
  --exec_model_name gpt-4.1-mini

Data And Reporting

Evaluation reads local JSONL files under data/datasets/. Small fixtures are tracked in git; the larger integrations are prepared locally by data/download.py:

python -m data.download

Some integrations intentionally use repository-local splits or reporting protocols rather than the official benchmark setup, especially AIME, LiveCode, and the MusiqueAns / MuSiQue-Ans path. Treat those numbers as local experiment results unless you rerun the official benchmark evaluation. Exact file policy, downloader behavior, and reporting caveats live in docs/data.md.

Documentation

docs/setup.md: install, config resolution, repository layout
docs/running.md: CLI entrypoints, pipeline flags, wrapper scripts
docs/workflows.md: generated workflow files, node registration, workspace layout
docs/data.md: dataset files, download policy, split/reporting caveats
docs/demo-and-eval.md: tracked MATHDEMO seed and standalone workflow evaluation

@inproceedings{agentslimming2026,
  title = {AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems}, 
  author={Yulang Chen and Haoxuan Peng and Jinyan Liu and Zichen Wen and Dongrui Liu and Linfeng Zhang},
  booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  year = {2026}
}

AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems

Supported Tasks

Quick Start

Data And Reporting

Documentation

License

Safety

Citation