README.md

July 29, 2025 ยท View on GitHub

LLMSR@XLLM25:๐Ÿง  Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

๐ŸŽ‰ Third-place solution to the XLLM@ACL2025 Shared Task-III: LLM for Structural Reasoning ๐Ÿ†

๐Ÿ’Œ Contact: jamse_yuan@163.com

Paper GitHub Repo stars

Less is More: Structured Reasoning Framework


โญ If you find this project helpful, please consider giving us a star to support the latest updates.


๐Ÿ”ฅ News

  • 2025.06.15 ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ We're thrilled to announce that our technical report Less is More, which earned 3rd place, has been officially accepted to the LLMSR@XLLM ACL 2025 Workshop!

    ๐Ÿ–ผ๏ธ Click to view our Less-is-more Poster (LLMSR@XLLM ACL 2025)

  • 2025.05.16 ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Excited to share that our earlier work Reversal of Thought has been accepted to ACL2025 Main!

  • 2025.05.01 ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Honored to announce that our ECNU-Passion team won ๐Ÿ† 3rd place in the XLLM@ACL 2025 Shared Task III: LLM-SR!

  • 2025.04.23 ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Released all source code ๐Ÿ”“ to the public to support transparency and reproducibility.

  • 2025.04.23 ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Published our ECNU-Passion Team technical report ๐Ÿ“„ Less is More based on our submission to the XLLM@ACL 2025 Shared Task III.


๐Ÿ“– Citation

If you find our work useful for your research, please kindly cite our paper as follows:

@inproceedings{yuan2025reversal,
    title = "Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up",
    author={Yuan, Jiahao and Du, Dehui and Zhang, Hao and Di, Zixiang and Naseem, Usman},
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    pages = "19442--19459",
    year = "2025"
}

@inproceedings{yuan2025llmsr,
    title = "LLMSR@XLLM25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation",
    author={Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang},
    booktitle = "Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)",
    pages = "274--282",
    year = "2025"
}


๐Ÿ” Overview

This repository provides the official full implementation of our "Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation" framework, which distills high-quality structured reasoning data into multi-agent LLaMA-3 modules. It addresses low-resource structured reasoning by combining:


๐Ÿš€ Highlights

  • ๐Ÿงฉ Modular Agents: Specialized models for question parsing, CoT decomposition, and verification

  • ๐Ÿ” Semantic ICL Retrieval: Top-k demos fetched via BGE-M3 embeddings

  • ๐ŸŽฏ Reward Filtering: LLaMA3.2 Reward model filters reasoning quality

  • โšก LoRA+ Fine-tuning: Efficient SFT on each role using ms-swift

  • ๐Ÿ“Š Structured Output: JSON-compatible format for downstream use


๐Ÿ“ฆ Installation

git clone https://github.com/JhCircle/Less-is-More.git
cd Less-is-More
pip install -r requirements.txt

๐Ÿ—‚๏ธ Project Structure

.
โ”œโ”€โ”€ data/                               # Raw and processed data
โ”‚   โ”œโ”€โ”€ train.txt                       # Raw LogiQA-style questions
โ”‚   โ”œโ”€โ”€ All_Train_With_Scores.jsonl     # CoT scoring results
โ”‚   โ”œโ”€โ”€ train/{strategy}_filtered.jsonl # Filtered by reward
โ”‚   โ”œโ”€โ”€ test/test_question_parsing_role.jsonl
โ”‚   โ”œโ”€โ”€ test/test_cot_parsing_role.jsonl 
โ”‚   โ””โ”€โ”€ test/test_cot_verify_role_role.jsonl
โ”‚
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ prompt.py                      # Prompt templates
โ”‚   โ””โ”€โ”€ llm_utils.py                   # Inference / pipeline tools
โ”‚
โ”‚โ”€โ”€ data_synthesize.py             # Generate CoT + parsing
โ”‚โ”€โ”€ reward_filter.py               # Score CoT quality using reward model
โ”‚โ”€โ”€ extract_train_role.py          # Extract instruction-role data for training
โ”‚โ”€โ”€ extract_test_role.py           # Extract data for evaluation
โ”‚โ”€โ”€ train_qp.sh                    # Shell script for LoRA+ training on Question Parsing
โ”‚โ”€โ”€ train_cp.sh                    # Shell script for LoRA+ training on CoT Parsing
โ”‚โ”€โ”€ train_cv.sh                    # Shell script for LoRA+ training on CoT Verify (Statement+Verification)
โ”‚โ”€โ”€ infer.sh                       # Full structured inference pipeline
โ”‚
โ””โ”€โ”€ README.md

๐Ÿ› ๏ธ How to Run

1๏ธโƒฃ Step 1: ๐Ÿง  Data Synthesis

Generate high-quality Question Parsing (QP), Chain-of-Thought Parsing (CP), and CoT Verification (CV: including both statement extraction and logical validation) from raw LogiQA questions using GPT-4o via Retrieval-Augmented In-Context Learninig.

python data_synthesize.py \
  --demo_pool demo_pool.json \
  --logiqa_file data/train.txt \
  --output_file data/Train_LogicQA.jsonl \
  --embedding_model BAAI/bge-m3 \
  --tokenizer_name BAAI/bge-m3 \
  --model_id gpt-4o-2024-08-06 \
  --api_key YOUR_API_KEY \
  --base_url YOUR_OPENAI_API

2๏ธโƒฃ Step 2: ๐Ÿ† Reward Filtering

Use a reward model to evaluate CoT quality and retain only samples with reward > 0.

python reward_filter.py

๐ŸŽฏ Strategy Options

StrategyDescription
with_few_shotSelect samples with high reward under few-shot prompting (reward > 0)
without_few_shotSelect samples with high reward under zero-shot prompting (reward > 0)
average (default)Select samples with highest average reward across both settings (reward > 0)

Generates:

  • data/All_Train_With_Scores.jsonl
  • data/with_few_shot_filtered.jsonl
  • data/without_few_shot_filtered.jsonl
  • data/average_filtered.jsonl

3๏ธโƒฃ Step 3: ๐Ÿ“Š Extract Role Data

Convert filtered CoT data into structured instruction formats for each role. Each file is used to train a different role agent (QP / CP / CV).

python scripts/extract_train_role.py
python scripts/extract_test_role.py

Outputs:

data/train/{strategy}/training_question_parsing_role.jsonl
data/train/{strategy}/training_cot_parsing_role.jsonl
data/train/{strategy}/training_cot_verify_role.jsonl

4๏ธโƒฃ Step 4: ๐Ÿงฌ Fine-Tune Role Agents (QP / CP / CV)

Train each role agent (Question Parsing / CoT Parsing / CoT Verify) using reward-filtered data.

bash train_qp.sh
bash train_cv.sh
bash train_cs.sh

To switch filtering strategy (with_few_shot, without_few_shot, average, all), change this line in the .sh file:

strategy="average"

โœ… Summary

Role AgentInput FileTask
QP (Parser)training_question_parsing_role.jsonlExtract constraints/facts
CP (Parser)training_cot_parsing_role.jsonlBreak CoT into statements
CV (Verifier)training_cot_verify_role.jsonlFind evidence + verify logic

5๏ธโƒฃ Step 5: Multi-Agent Structured Inference

Use the trained role agents to perform structured reasoning on new questions.

bash infer.sh

#!/bin/bash

TEST_FILE="test.jsonl"
QP_MODEL_PATH="./Question_Parsing"
CP_MODEL_PATH="./CoT_Parsing"
CV_MODEL_PATH="./CoT_Verify"
EMBEDDING_MODEL="BAAI/bge-m3"

python inference_pipeline.py \
  --test_file "$TEST_FILE" \
  --qp_model_id_or_path "$QP_MODEL_PATH" \
  --cp_model_id_or_path "$CP_MODEL_PATH" \
  --cv_model_id_or_path "$CV_MODEL_PATH" \
  --icl_embedding "$EMBEDDING_MODEL"

Produces results.json in the following structure:

[
    {
        "question": "Fair use refers to the non-commercial use of works published by others without the permission of the copyright owner, and without having to pay remuneration under the circumstances specified in the law.The \"cases specified in the law\" mainly include: (1) Personal study, research or appreciation, using published works of others; (2) performing published works for free; (3) copying, painting, photography, video recording of artistic works installed or displayed in outdoor public places; (4) Translate published works created in Chinese and written into minority languages and publish works for publication.\nAccording to the above provisions, Which of the following are fair use:\nA.A sang an unpublished song at the class party\nB.B translates an English work into Mongolian work and publishes it\nC.Company C took the sculptures in the public square and made them into pictures.\nD.Ding Wei wrote a paper and copied a paper published by Geng in a journal for reference",
        "question_parsing": [
            "Fair use refers to the non-commercial use of works published by others without the permission of the copyright owner, and without having to pay remuneration under the circumstances specified in the law",
            "The cases specified in the law mainly include: (1) Personal study, research or appreciation, using published works of others",
            "Performing published works for free",
            "Copying, painting, photography, video recording of artistic works installed or displayed in outdoor public places",
            "Translate published works created in Chinese and written into minority languages and publish works for publication"
        ],
        "answer": "d",
        "id": 2021,
        "cot": "e definition of fair use mentions that it is the non-commercial use of works published by others without permission, and the main cases specified in the law include personal study, research, or appreciation, performing published works for free, copying artistic works displayed in outdoor public places, and translating published works into minority languages. Options B and D seem to fit into the category of fair use, as they involve translating and using published works for non-commercial purposes. Option C involves copying artistic works displayed in public places, which is also a specified case of fair use. Option A, however, involves singing an unpublished song, which is not a specified case of fair use.",
        "cot_parsing": [
            {
                "statement": "Options B and D fit into the category of fair use.",
                "evidence": "Options B and D seem to fit into the category of fair use, as they involve translating and using published works for non-commercial purposes.",
                "Verification": "true"
            },
            {
                "statement": "Option C involves fair use.",
                "evidence": "Option C involves copying artistic works displayed in public places, which is a specified case of fair use.",
                "Verification": "true"
            },
            {
                "statement": "Option A does not involve fair use.",
                "evidence": "Singing an unpublished song is not a specified case of fair use.",
                "Verification": "false"
            }
        ]
    }
]

๐Ÿ Evaluation

SettingQuestion_F1Statement_F1Evidence_F1Reasoning_F1
Structure Filtered56.8736.7210.805.20
0-shot Reward62.7638.0512.797.15
5-shot Reward65.8938.2614.457.70
๐Ÿฅ‡ Avg. Reward (Ours)66.7139.2114.928.98

๐Ÿ“ฌ Contact

For any questions, suggestions, or collaborations, feel free to open an issue or start a discussion in the community.
I'd ๐Ÿ’– to hear from you and are always open to feedback or collaboration ideas!

๐Ÿ“ฌ Contact me: Jiahao Yuan


๐Ÿ™ Acknowledgement

We sincerely thank the organizers of the XLLM@ACL2025 Shared Task for providing an open and challenging platform on LLM for Structural Reasoning.
This work has greatly benefited from the generous contributions of the open-source community. In particular, we acknowledge the following resources:

๐Ÿ“˜ LogiQA โ€“ A dataset for evaluating logical reasoning in QA tasks
๐Ÿง  BAAI/bge-m3 โ€“ A powerful multilingual embedding model
๐Ÿ† Ray2333/GRM-Llama3.2-3B-rewardmodel-ft โ€“ A high-performing LLaMA3-based reward model
๐Ÿงฐ microsoft/MS-Swift โ€“ A Scalable lightWeight Infrastructure for Fine-Tuning

We are truly grateful to the community for making such impactful resources openly available.