README.md
July 29, 2025 ยท View on GitHub
LLMSR@XLLM25:๐ง Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
๐ Third-place solution to the XLLM@ACL2025 Shared Task-III: LLM for Structural Reasoning ๐
๐ Contact: jamse_yuan@163.com
โญ If you find this project helpful, please consider giving us a star to support the latest updates.
๐ฅ News
-
2025.06.15๐๐๐ We're thrilled to announce that our technical report Less is More, which earned 3rd place, has been officially accepted to the LLMSR@XLLM ACL 2025 Workshop! -
2025.05.16๐๐๐ Excited to share that our earlier work Reversal of Thought has been accepted to ACL2025 Main! -
2025.05.01๐๐๐ Honored to announce that our ECNU-Passion team won ๐ 3rd place in the XLLM@ACL 2025 Shared Task III: LLM-SR! -
2025.04.23๐๐๐ Released all source code ๐ to the public to support transparency and reproducibility. -
2025.04.23๐๐๐ Published our ECNU-Passion Team technical report ๐ Less is More based on our submission to the XLLM@ACL 2025 Shared Task III.
๐ Citation
If you find our work useful for your research, please kindly cite our paper as follows:
@inproceedings{yuan2025reversal,
title = "Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up",
author={Yuan, Jiahao and Du, Dehui and Zhang, Hao and Di, Zixiang and Naseem, Usman},
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
pages = "19442--19459",
year = "2025"
}
@inproceedings{yuan2025llmsr,
title = "LLMSR@XLLM25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation",
author={Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang},
booktitle = "Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)",
pages = "274--282",
year = "2025"
}
๐ Overview
This repository provides the official full implementation of our "Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation" framework, which distills high-quality structured reasoning data into multi-agent LLaMA-3 modules. It addresses low-resource structured reasoning by combining:
-
๐ง Reverse-prompted task induction
-
๐ Retrieval-augmented CoT generation
-
๐ Reward-guided filtering for faithful and interpretable supervision
๐ Highlights
-
๐งฉ Modular Agents: Specialized models for question parsing, CoT decomposition, and verification
-
๐ Semantic ICL Retrieval: Top-k demos fetched via BGE-M3 embeddings
-
๐ฏ Reward Filtering: LLaMA3.2 Reward model filters reasoning quality
-
โก LoRA+ Fine-tuning: Efficient SFT on each role using ms-swift
-
๐ Structured Output: JSON-compatible format for downstream use
๐ฆ Installation
git clone https://github.com/JhCircle/Less-is-More.git
cd Less-is-More
pip install -r requirements.txt
๐๏ธ Project Structure
.
โโโ data/ # Raw and processed data
โ โโโ train.txt # Raw LogiQA-style questions
โ โโโ All_Train_With_Scores.jsonl # CoT scoring results
โ โโโ train/{strategy}_filtered.jsonl # Filtered by reward
โ โโโ test/test_question_parsing_role.jsonl
โ โโโ test/test_cot_parsing_role.jsonl
โ โโโ test/test_cot_verify_role_role.jsonl
โ
โโโ utils/
โ โโโ prompt.py # Prompt templates
โ โโโ llm_utils.py # Inference / pipeline tools
โ
โโโ data_synthesize.py # Generate CoT + parsing
โโโ reward_filter.py # Score CoT quality using reward model
โโโ extract_train_role.py # Extract instruction-role data for training
โโโ extract_test_role.py # Extract data for evaluation
โโโ train_qp.sh # Shell script for LoRA+ training on Question Parsing
โโโ train_cp.sh # Shell script for LoRA+ training on CoT Parsing
โโโ train_cv.sh # Shell script for LoRA+ training on CoT Verify (Statement+Verification)
โโโ infer.sh # Full structured inference pipeline
โ
โโโ README.md
๐ ๏ธ How to Run
1๏ธโฃ Step 1: ๐ง Data Synthesis
Generate high-quality Question Parsing (QP), Chain-of-Thought Parsing (CP), and CoT Verification (CV: including both statement extraction and logical validation) from raw LogiQA questions using GPT-4o via Retrieval-Augmented In-Context Learninig.
python data_synthesize.py \
--demo_pool demo_pool.json \
--logiqa_file data/train.txt \
--output_file data/Train_LogicQA.jsonl \
--embedding_model BAAI/bge-m3 \
--tokenizer_name BAAI/bge-m3 \
--model_id gpt-4o-2024-08-06 \
--api_key YOUR_API_KEY \
--base_url YOUR_OPENAI_API
2๏ธโฃ Step 2: ๐ Reward Filtering
Use a reward model to evaluate CoT quality and retain only samples with reward > 0.
python reward_filter.py
๐ฏ Strategy Options
| Strategy | Description |
|---|---|
with_few_shot | Select samples with high reward under few-shot prompting (reward > 0) |
without_few_shot | Select samples with high reward under zero-shot prompting (reward > 0) |
average (default) | Select samples with highest average reward across both settings (reward > 0) |
Generates:
data/All_Train_With_Scores.jsonldata/with_few_shot_filtered.jsonldata/without_few_shot_filtered.jsonldata/average_filtered.jsonl
3๏ธโฃ Step 3: ๐ Extract Role Data
Convert filtered CoT data into structured instruction formats for each role. Each file is used to train a different role agent (QP / CP / CV).
python scripts/extract_train_role.py
python scripts/extract_test_role.py
Outputs:
data/train/{strategy}/training_question_parsing_role.jsonl
data/train/{strategy}/training_cot_parsing_role.jsonl
data/train/{strategy}/training_cot_verify_role.jsonl
4๏ธโฃ Step 4: ๐งฌ Fine-Tune Role Agents (QP / CP / CV)
Train each role agent (Question Parsing / CoT Parsing / CoT Verify) using reward-filtered data.
bash train_qp.sh
bash train_cv.sh
bash train_cs.sh
To switch filtering strategy (with_few_shot, without_few_shot, average, all), change this line in the .sh file:
strategy="average"
โ Summary
| Role Agent | Input File | Task |
|---|---|---|
| QP (Parser) | training_question_parsing_role.jsonl | Extract constraints/facts |
| CP (Parser) | training_cot_parsing_role.jsonl | Break CoT into statements |
| CV (Verifier) | training_cot_verify_role.jsonl | Find evidence + verify logic |
5๏ธโฃ Step 5: Multi-Agent Structured Inference
Use the trained role agents to perform structured reasoning on new questions.
bash infer.sh
#!/bin/bash
TEST_FILE="test.jsonl"
QP_MODEL_PATH="./Question_Parsing"
CP_MODEL_PATH="./CoT_Parsing"
CV_MODEL_PATH="./CoT_Verify"
EMBEDDING_MODEL="BAAI/bge-m3"
python inference_pipeline.py \
--test_file "$TEST_FILE" \
--qp_model_id_or_path "$QP_MODEL_PATH" \
--cp_model_id_or_path "$CP_MODEL_PATH" \
--cv_model_id_or_path "$CV_MODEL_PATH" \
--icl_embedding "$EMBEDDING_MODEL"
Produces results.json in the following structure:
[
{
"question": "Fair use refers to the non-commercial use of works published by others without the permission of the copyright owner, and without having to pay remuneration under the circumstances specified in the law.The \"cases specified in the law\" mainly include: (1) Personal study, research or appreciation, using published works of others; (2) performing published works for free; (3) copying, painting, photography, video recording of artistic works installed or displayed in outdoor public places; (4) Translate published works created in Chinese and written into minority languages and publish works for publication.\nAccording to the above provisions, Which of the following are fair use:\nA.A sang an unpublished song at the class party\nB.B translates an English work into Mongolian work and publishes it\nC.Company C took the sculptures in the public square and made them into pictures.\nD.Ding Wei wrote a paper and copied a paper published by Geng in a journal for reference",
"question_parsing": [
"Fair use refers to the non-commercial use of works published by others without the permission of the copyright owner, and without having to pay remuneration under the circumstances specified in the law",
"The cases specified in the law mainly include: (1) Personal study, research or appreciation, using published works of others",
"Performing published works for free",
"Copying, painting, photography, video recording of artistic works installed or displayed in outdoor public places",
"Translate published works created in Chinese and written into minority languages and publish works for publication"
],
"answer": "d",
"id": 2021,
"cot": "e definition of fair use mentions that it is the non-commercial use of works published by others without permission, and the main cases specified in the law include personal study, research, or appreciation, performing published works for free, copying artistic works displayed in outdoor public places, and translating published works into minority languages. Options B and D seem to fit into the category of fair use, as they involve translating and using published works for non-commercial purposes. Option C involves copying artistic works displayed in public places, which is also a specified case of fair use. Option A, however, involves singing an unpublished song, which is not a specified case of fair use.",
"cot_parsing": [
{
"statement": "Options B and D fit into the category of fair use.",
"evidence": "Options B and D seem to fit into the category of fair use, as they involve translating and using published works for non-commercial purposes.",
"Verification": "true"
},
{
"statement": "Option C involves fair use.",
"evidence": "Option C involves copying artistic works displayed in public places, which is a specified case of fair use.",
"Verification": "true"
},
{
"statement": "Option A does not involve fair use.",
"evidence": "Singing an unpublished song is not a specified case of fair use.",
"Verification": "false"
}
]
}
]
๐ Evaluation
| Setting | Question_F1 | Statement_F1 | Evidence_F1 | Reasoning_F1 |
|---|---|---|---|---|
| Structure Filtered | 56.87 | 36.72 | 10.80 | 5.20 |
| 0-shot Reward | 62.76 | 38.05 | 12.79 | 7.15 |
| 5-shot Reward | 65.89 | 38.26 | 14.45 | 7.70 |
| ๐ฅ Avg. Reward (Ours) | 66.71 | 39.21 | 14.92 | 8.98 |
๐ฌ Contact
For any questions, suggestions, or collaborations, feel free to open an issue or start a discussion in the community.
I'd ๐ to hear from you and are always open to feedback or collaboration ideas!
๐ฌ Contact me: Jiahao Yuan
๐ Acknowledgement
We sincerely thank the organizers of the XLLM@ACL2025 Shared Task for providing an open and challenging platform on LLM for Structural Reasoning.
This work has greatly benefited from the generous contributions of the open-source community. In particular, we acknowledge the following resources:
๐ LogiQA โ A dataset for evaluating logical reasoning in QA tasks
๐ง BAAI/bge-m3 โ A powerful multilingual embedding model
๐ Ray2333/GRM-Llama3.2-3B-rewardmodel-ft โ A high-performing LLaMA3-based reward model
๐งฐ microsoft/MS-Swift โ A Scalable lightWeight Infrastructure for Fine-Tuning
We are truly grateful to the community for making such impactful resources openly available.