SpInfer Artifact for EuroSys'25.
February 28, 2025 ยท View on GitHub
1. Clone this project.
git clone https://github.com/xxyux/SpInfer.git
cd SpInfer
git submodule update --init --recursive
source Init_SpInfer.sh
cd $SpInfer_HOME/third_party/FasterTransformer && git apply ../ft_spinfer.patch
cd $SpInfer_HOME/third_party/sputnik && git apply ../sputnik.patch
- Requirements:
Ubuntu 16.04+gcc >= 7.3cmake >= 3.30.3CUDA >= 12.2andnvcc >= 12.0- NVIDIA GPU with
sm >= 80(i.e., Ampere-A6000 and Ada -RTX4090).
2. Environment Setup. (Install via Conda)
- 2.1 Install
condaon system Toturial. - 2.2 Create a
condaenvironment:
cd $SpInfer_HOME
conda env create -f spinfer.yml
conda activate spinfer
3. Install SpInfer.
The libSpMM_API.so and SpMM_API.cuh will be available for easy integration after:
cd $SpInfer_HOME/build && make -j
4. Running SpInfer in kernel benchmark (Figure 10).
- Build Sputnik.
cd $SpInfer_HOME/third_party/
source build_sputnik.sh
- Build SparTA.
cd $SpInfer_HOME/third_party/
source preparse_cusparselt.sh
- Reproduce Figure 10.
cd $SpInfer_HOME/kernel_benchmark
source test_env
make -j
source benchmark.sh
Check the results in raw csv files and the reproduced Figure10.png (Fig. 10).
5. Running End-to-end model.
5.1 Building
Follow the steps in SpInfer/docs/LLMInferenceExample
- Building Faster-Transformer with (SpInfer, Flash-llm or Standard) integration
- Downloading & Converting OPT models
- Configuration Note: Model_dir is different for SpInfer, Flash-llm and Faster-Transformer.
5.2 Running SpInfer Inference
cd $SpInfer_HOME/third_party/bash run_1gpu_loop.sh- Check the results (Fig.13/14) in
$SpInfer_HOME/third_party/FasterTransformer/OutputFile_1gpu_our_60_inlen64/- Test tensor_para_size=2 using
bash run_2gpu_loop.sh- Test tensor_para_size=4 using
bash run_4gpu_loop.sh
5.3 Running Flash-llm Inference
cd $FlashLLM_HOME/third_party/bash run_1gpu_loop.sh- Check the results in
$FlashLLM_HOME/third_party/FasterTransformer/OutputFile_1gpu_our_60_inlen64/- Test tensor_para_size=1 using
bash run_1gpu_loop.sh
5.4 Running Faster-transformer Inference
cd $FT_HOME/third_party/bash run_2gpu_loop.sh- Check the results in
$FT_HOME/FasterTransformer/OutputFile_2gpu_our_60_inlen64/
5.5 Runing DeepSpeed Inference
cd $SpInfer_HOME/end2end_inference/ds_scriptspip install -r requirements.txtbash run_ds_loop.sh- Check the results in
$SpInfer_HOME/end2end_inference/ds_scripts/ds_result/