AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems
May 12, 2026 ยท View on GitHub
AgentSlimming is a research framework for optimizing graph-structured LLM agent workflows. It searches over executable workflow graphs, evaluates them on task benchmarks, and then reduces cost with pruning, quantization, and optional finetuning.
The implementation is centered on GraphFlow: a directed acyclic graph of nodes connected by typed edges. Generated workflows are stored as compact Python artifacts under workspace/<dataset>/..., while common execution and node behavior live in src/core/.
Short notes:
- This repository started from an AFlow-derived codebase and has since been substantially refactored.
- The exploration style of the MCTS pipeline can be steered significantly by the optimizer prompt in
src/prompts/. - Nodes support an optional
count_towards_costboolean (defaultTrue) that lets you exclude a node's LLM usage from the final cost summary; set it in the generatedgraph.pynode constructor. - This repository went through a large refactor before open-sourcing. We cannot guarantee that every path has been extensively tested yet. If you hit issues, please open an issue or send a PR.
Terminology in this repository:
mcts: graph/prompt search over executable workflows.prune,quantize, andfinetuneare inspired by neural-network optimization terminology, but here they operate on agent workflows rather than model weights.prune: removes workflow nodes or paths to simplify the execution graph, rather than pruning neural network parameters.quantize: replaces selected workflow nodes with a cheaper model to reduce execution cost, rather than quantizing numeric weights or activations.finetune: runs another MCTS-style workflow optimization stage starting from quantized workflows, rather than updating a base model with gradient-based finetuning.
Supported Tasks
- QA:
DROP,HotpotQA,MusiqueAns - Math:
GSM8K,MATH,AIME - Code generation:
HumanEval,MBPP,LiveCode
Task registration is in src/catalog/datasets.py and benchmark implementations live under benchmarks/.
The dataset catalog is the single source of truth for CLI support, evaluator selection, and dataset file resolution.
Quick Start
- Create a Python environment and install dependencies:
python3.11 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
- Create a local model config:
cp config/config.example.yaml config/config.yaml
-
Fill in
base_url,api_key, and token prices for the models you plan to use. -
Run one pipeline:
python -m src.cli.run \
--config config/config.yaml \
--dataset MATH \
--pipelines mcts \
--workspace workspace \
--max_rounds 10 \
--sample 4 \
--opt_model_name gpt-4.1 \
--exec_model_name gpt-4.1-mini
Data And Reporting
Evaluation reads local JSONL files under data/datasets/. Small fixtures are tracked in git; the larger integrations are prepared locally by data/download.py:
python -m data.download
Some integrations intentionally use repository-local splits or reporting protocols rather than the official benchmark setup, especially AIME, LiveCode, and the MusiqueAns / MuSiQue-Ans path. Treat those numbers as local experiment results unless you rerun the official benchmark evaluation. Exact file policy, downloader behavior, and reporting caveats live in docs/data.md.
Documentation
- docs/setup.md: install, config resolution, repository layout
- docs/running.md: CLI entrypoints, pipeline flags, wrapper scripts
- docs/workflows.md: generated workflow files, node registration, workspace layout
- docs/data.md: dataset files, download policy, split/reporting caveats
- docs/demo-and-eval.md: tracked
MATHDEMOseed and standalone workflow evaluation
License
This repository is released under the MIT License. For upstream attribution details, see THIRD_PARTY_NOTICES.md.
Safety
Code-generation benchmarks execute model-produced Python code through the local runner. Do not run untrusted workflows outside a restricted environment.
Citation
If you use AgentSlimming in your research, please cite the ACL paper.
@inproceedings{agentslimming2026,
title = {AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems},
author={Yulang Chen and Haoxuan Peng and Jinyan Liu and Zichen Wen and Dongrui Liu and Linfeng Zhang},
booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
year = {2026}
}