The Next Step in Multimodal Academic Generation
April 13, 2026 ยท View on GitHub
The Next Step in Multimodal Academic Generation
โจ Focus on Multimodal Academic Presentation Generation: from paper PDFs to PPTs, Posters, and PRs โจ
| ๐งฉ Unified Framework ย |ย ๐ฐ Low-Cost Efficiency ย |ย ๐ Modern & Trendy Design ย |ย ๐ Multi-Modal Support |
๐ Table of Contents
๐ฅ News
[2026/03/11] ๐ PaperX Demo is now officially live on Hugging Face! Click here to see our demo.
[2026/02/05] ๐ PaperX is now available on arXiv.
๐ Overview
๐งฉ Demo
Experience PaperX, a multimodal academic generation and analysis platform, now live on Hugging Face Spaces as an interactive demo.
๐ Quick Start
Installation
Clone this repository and navigate to the folder:
git clone https://github.com/yutao1024/PaperX.git
cd PaperX
Create New Environment
To ensure reproducibility and avoid dependency conflicts, we strongly recommend creating a dedicated Python environment for this project before installation.
conda create -n PaperX python=3.10
conda activate PaperX
Install Mineru
This project relies on Mineru for document parsing and structured content extraction.
Please make sure Mineru is correctly installed before running the pipeline.
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[all]"
If you have any questions about using MinerU, please visit MinerU Wiki.
Install dependencies
After setting up the environment and installing Mineru, install the required Python dependencies for this project.
All dependencies are listed in requirements.txt. You can install them using the following command:
pip install -r requirements.txt
Alternatively, you can install the dependencies individually using the following commands.
pip install openai
pip install pillow
pip install -U google-genai
python -m pip install -U beautifulsoup4
pip install playwright
playwright install
Place the PDF papers to be processed in the papers directory
papers/
-paper1.pdf
-paper2.pdf
-...
Launch Application
Set up environment variables
Before running Mineru extension, please configure the required environment variables as shown below
export MINERU_FORMULA_ENABLE=false
export MINERU_TABLE_ENABLE=false
More configurations should be set in the config.yaml file:
# Ouput Index Path
path:
root_folder: "./mineru_outputs"
# Generation Model Settings
model_settings:
generation_model: "gemini-3-pro-preview"
# API Settings
api_keys:
gemini_api_key: "YOUR_GEMINI_API_KEY"
openai_api_key: "YOUR_OPENAI_API_KEY"
Note: The DAG generation stage is fixed to use the gemini-3-pro model. For the generation of PPTs, posters, and PR content, you may freely choose either gemini-3-pro or gpt-4o. Please specify the selected model name in the generation_model field of the YAML configuration file. If you choose gpt-4o, make sure that the OpenAI API key is correctly provided
Parse PDF files
Use the following command to parse PDF files using Mineru and convert them into structured outputs for DAG construction.
mineru -p papers -o mineru_outputs --sourcelocal -b pipeline
Run the program
python main.py
Results
You can find the resulting PPT in the following directory:
auto/final/<ppt_number>_ppt_final.png
You can find the resulting Poster in the following directory:
auto/final/poster_final.png
You can find the resulting PR in the following directory:
auto/markdown_refinement.md
Evaluation
PPT
Move the final slides for evaluation:
cd PaperX/evaluation/PPTAgent/
python move_ppt.py
Set the API_KEY and BASE_URL environment variables:
export OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
export OPENAI_BASE_URL = "YOUR_OPENAI_BASE_URL" # (optional)
Run the PPTAgent-eval metrix:
# For single paper evaluation
python run_benchmark.py /path/to/specific_paper_folder
# For batch papers evaluation
python run_benchmark.py /path/to/root_folder --batch
Run the existing metrix:
# For single paper evaluation
python run_exist_metrix.py /path/to/specific_paper_folder --token "Your HuggingFace Token"
# For batch papers evaluation
python run_exist_metrix.py /path/to/root_folder --batch --token "Your HuggingFace Token"
Aggregate all evaluation results:
python calculate_avg.py
For reference to the original Evaluation of PPT, please click here.
Poster
Clone the Paper2Poster repository to the desired location
cd evaluation
git clone https://github.com/Paper2Poster/Paper2Poster.git
cd Paper2Poster
Download Paper2Poster evaluation dataset via:
python -m PosterAgent.create_dataset
Create a folder named PaperX_generated_posters and copy the Paper2Poster-data directory into it. For each subdirectory inside Paper2Poster-data, keep only the paper.pdf file and delete all other files.
# Create the target folder
mkdir -p PaperX_generated_posters
# Copy Paper2Poster-data into it
cp -r Paper2Poster-data PaperX_generated_posters/
# For each subfolder, keep only paper.pdf and remove all other files
find PaperX_generated_posters/Paper2Poster-data -type f ! -name "paper.pdf" -delete
Use src/transfer_poster.py to move the generated poster results to Paper2Poster/PaperX_generated_posters/Paper2Poster-data
python transfer_poster.py
Start evaluation
# Terminal 1:
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--port 7000 \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--trust-remote-code \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.95
# Terminal 2:
# Set the API_KEY and BASE_URL environment variables:
export OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
export OPENAI_BASE_URL = "YOUR_OPENAI_BASE_URL" # (optional)
# evaluation:
python src/run_paper2poster_benchmark.py --dir_a Paper2Poster/PaperX_generated_posters/Paper2Poster-data --dir_b Paper2Poster/eval_results
For reference to the original Evaluation of Posters, please click here.
PR
Move the final PRs for evaluation:
cd PaperX/evaluation/AutoPR/
python move_pr.py
Edit the .env file with your API credentials:
# Main API Base URL for text and vision models (e.g., OpenAI, Qwen, etc.)
OPENAI_API_BASE="https://api.openai.com/v1"
# Your API Key
OPENAI_API_KEY="sk-..."
Download the PRBench dataset from Hugging Face Hub.
python download_and_reconstruct_prbench.py \
--repo-id yzweak/PRBench \
--subset core \ # or "full"
--output-dir eval
Run PR evaluation:
chmod +x scripts/run_eval.sh
./scripts/run_eval.sh
Calculate and View Metrics:
chmod +x scripts/calc_results.sh
./scripts/calc_results.sh
For reference to the original Evaluation of PR, please click here.
๐ธ Showcase
PPTs
Posters
PRs
PaperX with Nano Banana
PaperX supports integration with Nano Banana to achieve improved visual quality:
After integrating Nano Banana, PaperX also demonstrates stronger generalization ability, as illustrated by the generated mind map, overview, and web examples shown below:
๐ Citation
Please kindly cite our paper if you find this project helpful.
@misc{yu2026paperxunifiedframeworkmultimodal,
title={PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG},
author={Tao Yu and Minghui Zhang and Zhiqing Cui and Hao Wang and Zhongtian Luo and Shenghua Chai and Junhao Gong and Yuzhao Peng and Yuxuan Zhou and Yujia Yang and Zhenghao Zhang and Haopeng Jin and Xinming Wang and Yufei Xiong and Jiabing Yang and Jiahao Yuan and Hanqing Wang and Hongzhu Yi and Yan Huang and Liang Wang},
year={2026},
eprint={2602.03866},
archivePrefix={arXiv},
primaryClass={cs.DL},
url={https://arxiv.org/abs/2602.03866},
}