The Next Step in Multimodal Academic Generation

April 13, 2026 · View on GitHub

The Next Step in Multimodal Academic Generation

✨ Focus on Multimodal Academic Presentation Generation: from paper PDFs to PPTs, Posters, and PRs ✨

📑 Table of Contents

[🔥 News] [🌈 Overview] [🧩 Demo] [🚀 Quick Start] [📸 Showcase] [📖 Citation]

🔥 News

[2026/03/11] 🚀 PaperX Demo is now officially live on Hugging Face! Click here to see our demo.

[2026/02/05] 📄 PaperX is now available on arXiv.

🧩 Demo

Experience PaperX, a multimodal academic generation and analysis platform, now live on Hugging Face Spaces as an interactive demo.

🚀 Quick Start

Installation

Clone this repository and navigate to the folder:

git clone https://github.com/yutao1024/PaperX.git
cd PaperX

Create New Environment

To ensure reproducibility and avoid dependency conflicts, we strongly recommend creating a dedicated Python environment for this project before installation.

conda create -n PaperX python=3.10
conda activate PaperX

Install Mineru

This project relies on Mineru for document parsing and structured content extraction.
Please make sure Mineru is correctly installed before running the pipeline.

pip install --upgrade pip  
pip install uv  
uv pip install -U "mineru[all]"

If you have any questions about using MinerU, please visit MinerU Wiki.

After setting up the environment and installing Mineru, install the required Python dependencies for this project.
All dependencies are listed in requirements.txt. You can install them using the following command:

pip install -r requirements.txt

Alternatively, you can install the dependencies individually using the following commands.

pip install openai  
pip install pillow  
pip install -U google-genai  
python -m pip install -U beautifulsoup4  
pip install playwright  
playwright install

Place the PDF papers to be processed in the papers directory

papers/  
 -paper1.pdf  
 -paper2.pdf  
 -...

Launch Application

Set up environment variables

Before running Mineru extension, please configure the required environment variables as shown below

export MINERU_FORMULA_ENABLE=false  
export MINERU_TABLE_ENABLE=false

More configurations should be set in the config.yaml file:

# Ouput Index Path
path:
  root_folder: "./mineru_outputs"

# Generation Model Settings
model_settings:
  generation_model: "gemini-3-pro-preview"

# API Settings
api_keys:
  gemini_api_key: "YOUR_GEMINI_API_KEY"
  openai_api_key: "YOUR_OPENAI_API_KEY"

Note: The DAG generation stage is fixed to use the gemini-3-pro model. For the generation of PPTs, posters, and PR content, you may freely choose either gemini-3-pro or gpt-4o. Please specify the selected model name in the generation_model field of the YAML configuration file. If you choose gpt-4o, make sure that the OpenAI API key is correctly provided

Parse PDF files

Use the following command to parse PDF files using Mineru and convert them into structured outputs for DAG construction.

mineru -p papers -o mineru_outputs --sourcelocal -b pipeline

Run the program

python main.py

Results

You can find the resulting PPT in the following directory:

auto/final/<ppt_number>_ppt_final.png

You can find the resulting Poster in the following directory:

auto/final/poster_final.png

You can find the resulting PR in the following directory:

auto/markdown_refinement.md

Evaluation

PPT

Move the final slides for evaluation:

cd PaperX/evaluation/PPTAgent/  
python move_ppt.py

Set the API_KEY and BASE_URL environment variables:

export OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
export OPENAI_BASE_URL = "YOUR_OPENAI_BASE_URL" # (optional)

Run the PPTAgent-eval metrix:

# For single paper evaluation
python run_benchmark.py /path/to/specific_paper_folder
# For batch papers evaluation
python run_benchmark.py /path/to/root_folder --batch

Run the existing metrix:

# For single paper evaluation
python run_exist_metrix.py /path/to/specific_paper_folder --token "Your HuggingFace Token"
# For batch papers evaluation
python run_exist_metrix.py /path/to/root_folder --batch --token "Your HuggingFace Token"

Aggregate all evaluation results:

python calculate_avg.py

For reference to the original Evaluation of PPT, please click here.

Poster

Clone the Paper2Poster repository to the desired location

cd evaluation
git clone https://github.com/Paper2Poster/Paper2Poster.git
cd Paper2Poster

Download Paper2Poster evaluation dataset via:

python -m PosterAgent.create_dataset

Create a folder named PaperX_generated_posters and copy the Paper2Poster-data directory into it. For each subdirectory inside Paper2Poster-data, keep only the paper.pdf file and delete all other files.

# Create the target folder
mkdir -p PaperX_generated_posters

# Copy Paper2Poster-data into it
cp -r Paper2Poster-data PaperX_generated_posters/

# For each subfolder, keep only paper.pdf and remove all other files
find PaperX_generated_posters/Paper2Poster-data -type f ! -name "paper.pdf" -delete

Use src/transfer_poster.py to move the generated poster results to Paper2Poster/PaperX_generated_posters/Paper2Poster-data

python transfer_poster.py

Start evaluation

# Terminal 1:
python -m vllm.entrypoints.openai.api_server \
  --host 0.0.0.0 \
  --port 7000 \
  --model Qwen/Qwen2.5-VL-7B-Instruct \
  --trust-remote-code \
  --tensor-parallel-size 2 \
  --gpu-memory-utilization 0.95

# Terminal 2:
# Set the API_KEY and BASE_URL environment variables:  
export OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"  
export OPENAI_BASE_URL = "YOUR_OPENAI_BASE_URL" # (optional)  
# evaluation:  
python src/run_paper2poster_benchmark.py --dir_a Paper2Poster/PaperX_generated_posters/Paper2Poster-data --dir_b Paper2Poster/eval_results

For reference to the original Evaluation of Posters, please click here.

PR

Move the final PRs for evaluation:

cd PaperX/evaluation/AutoPR/  
python move_pr.py

Edit the .env file with your API credentials:

# Main API Base URL for text and vision models (e.g., OpenAI, Qwen, etc.)
OPENAI_API_BASE="https://api.openai.com/v1"
# Your API Key
OPENAI_API_KEY="sk-..."

Download the PRBench dataset from Hugging Face Hub.

python download_and_reconstruct_prbench.py \
    --repo-id yzweak/PRBench \
    --subset core \ # or "full"
    --output-dir eval

Run PR evaluation:

chmod +x scripts/run_eval.sh  
./scripts/run_eval.sh

Calculate and View Metrics:

chmod +x scripts/calc_results.sh  
./scripts/calc_results.sh

For reference to the original Evaluation of PR, please click here.

📸 Showcase

PPTs

Posters

PRs

PaperX with Nano Banana

PaperX supports integration with Nano Banana to achieve improved visual quality:
discussion_1 After integrating Nano Banana, PaperX also demonstrates stronger generalization ability, as illustrated by the generated mind map, overview, and web examples shown below:
discussion_2

📖 Citation

Please kindly cite our paper if you find this project helpful.

@misc{yu2026paperxunifiedframeworkmultimodal,
      title={PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG}, 
      author={Tao Yu and Minghui Zhang and Zhiqing Cui and Hao Wang and Zhongtian Luo and Shenghua Chai and Junhao Gong and Yuzhao Peng and Yuxuan Zhou and Yujia Yang and Zhenghao Zhang and Haopeng Jin and Xinming Wang and Yufei Xiong and Jiabing Yang and Jiahao Yuan and Hanqing Wang and Hongzhu Yi and Yan Huang and Liang Wang},
      year={2026},
      eprint={2602.03866},
      archivePrefix={arXiv},
      primaryClass={cs.DL},
      url={https://arxiv.org/abs/2602.03866}, 
}