README.md
October 23, 2025 ยท View on GitHub
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
What is DynaPipe?
We investigate a critical yet underexplored issue: the pipeline inter-stage bubble problem introduced by sampling operations. To address this challenge, we propose DynaPipe, a novel runtime dynamic layer redistribution scheme. By adaptively adjusting the computational load across pipeline stages, DynaPipe ensures more balanced task distribution, effectively aligning the pipeline and mitigating inter-stage imbalances. Compared with state-of-the-art pipeline inference frameworks, DynaPipe achieves notable performance gains and significantly improves overall efficiency.
Install DynaPipe
pip install --verbose -e .
Launch online serving
# To enable prefix caching, add "--enable-prefix-caching"
# To enable pipeline parallelism, add "--pp $PP_DEGREE"
python -m gllm.entrypoints.api_server --port $PORT --model-path $MODEL_PATH --enable-adjust-ayers
Online benchmark with gllm or vllm
python benchmarks/benchmark_serving.py --backend $BACKEND --model $MODEL \
--dataset-name $DATASET_NAME --dataset-path $DATASET_PATH \
--num-prompts $NUM_PROMPTS --port $PORT --trust-remote-code \
--request-rate $REQUEST_RATE
Acknowledgements
This project builds upon the foundational work of gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling.