CUDA 9.0

January 23, 2019 ยท View on GitHub

We need

  • pytorch
  • cuda
  • ninja

CUDA 9.0

You can install CUDA9.0, cuDNN, NCCL to local directory.

First install CUDA by running the downloaded executable.

Then download and extract cuDNN.

cudnn-9.0-linux-x64-v7.4.2.24
|-- include
|   `-- cudnn.h
|-- lib64
|   |-- libcudnn.so -> libcudnn.so.7
|   |-- libcudnn.so.7 -> libcudnn.so.7.4.2
|   |-- libcudnn.so.7.4.2
|   `-- libcudnn_static.a
`-- NVIDIA_SLA_cuDNN_Support.txt

Move cuDNN files to CUDA directory by

cp cudnn-9.0-linux-x64-v7.4.2.24/include/* cuda-9.0/include/
cp cudnn-9.0-linux-x64-v7.4.2.24/lib64/* cuda-9.0/lib64/

Then download and extract NCCL.

nccl_2.3.7-1+cuda9.0_x86_64
|-- include
|   `-- nccl.h
|-- lib
|   |-- libnccl.so -> libnccl.so.2
|   |-- libnccl.so.2 -> libnccl.so.2.3.7
|   |-- libnccl.so.2.3.7
|   `-- libnccl_static.a
`-- LICENSE.txt

Move NCCL files to CUDA directory by

cp nccl_2.3.7-1+cuda9.0_x86_64/include/* cuda-9.0/include/
cp nccl_2.3.7-1+cuda9.0_x86_64/lib/* cuda-9.0/lib64/

Pytorch 1.0

It's required to install pytorch from source.

Setup your python environment before installing pytorch. Anaconda is required in the following example. Change WORKING_DIR, CUDA_HOME to your paths and run the following commands.

WORKING_DIR=<your-directory-to-save-intermediate-results-of-installing-pytorch>
TORCH_DIR_NAME=pytorch_v1.0.0
mkdir -p ${WORKING_DIR}

pip uninstall -y torch

# Install basic dependencies
conda install --yes numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install --yes -c mingfeima mkldnn
# Add LAPACK support for the GPU
conda install --yes -c pytorch magma-cuda90

cd ${WORKING_DIR}
rm -rf ${TORCH_DIR_NAME}
# Tested with commit db5d313
git clone --recursive --single-branch --branch v1.0.0  https://github.com/pytorch/pytorch.git ${TORCH_DIR_NAME}
cd ${TORCH_DIR_NAME}

rm -rf build
rm -rf torch.egg-info
export CUDA_HOME=<your-cuda-directory>
export USE_SYSTEM_NCCL=1
export NCCL_LIB_DIR=${CUDA_HOME}/lib64  # For CUDA version > 8.0, you have to download NCCL lib independently
export NCCL_INCLUDE_DIR=${CUDA_HOME}/include
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]

python setup.py install 2>&1 | tee ${WORKING_DIR}/install_pytorch.log
cd ${WORKING_DIR}
python test_data_parallel.py 2>&1 | tee test_pytorch_data_parallel.log

The contents of test_data_parallel.py in above commands is

import torch
import torch.nn as nn
from torch.nn.parallel import DataParallel

model = nn.Linear(10, 20).cuda()
x = torch.ones(100, 10).float().cuda()
model_w = DataParallel(model, device_ids=[0,1,2,3])
x = model_w(x)
# x = model(x)
print(x.size())  # It should be (100, 20)

ninja

Download and extract ninja 1.8.2 from https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip. Add ninja to environment variable PATH.

Environment Variables

After installing pytorch, cuda and ninja, modify and add following lines to your .bashrc file.

export anaconda_home=<your-anaconda-directory>
export PATH=${anaconda_home}/bin:${PATH}
export LD_LIBRARY_PATH=${anaconda_home}/lib:${LD_LIBRARY_PATH}

export CUDA_HOME=<your-cuda-directory>
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}

export PATH=<your-ninja-directory>:${PATH}