README.md

October 23, 2025 · View on GitHub

DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism

What is DynaPipe?

We investigate a critical yet underexplored issue: the pipeline inter-stage bubble problem introduced by sampling operations. To address this challenge, we propose DynaPipe, a novel runtime dynamic layer redistribution scheme. By adaptively adjusting the computational load across pipeline stages, DynaPipe ensures more balanced task distribution, effectively aligning the pipeline and mitigating inter-stage imbalances. Compared with state-of-the-art pipeline inference frameworks, DynaPipe achieves notable performance gains and significantly improves overall efficiency.

Install DynaPipe

pip install --verbose -e .

Launch online serving

# To enable prefix caching, add "--enable-prefix-caching"
# To enable pipeline parallelism, add "--pp $PP_DEGREE"
python -m gllm.entrypoints.api_server --port $PORT --model-path $MODEL_PATH --enable-adjust-ayers

Online benchmark with gllm or vllm

python benchmarks/benchmark_serving.py --backend $BACKEND --model $MODEL \
        --dataset-name $DATASET_NAME --dataset-path $DATASET_PATH \
        --num-prompts $NUM_PROMPTS --port $PORT --trust-remote-code \
        --request-rate $REQUEST_RATE

Acknowledgements

This project builds upon the foundational work of gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling.