1. install nvdia driver, cuda 9.0 and cudnn7.0, and NCCL
#1
# Install driver
sudo apt-get install nvidia-384 nvidia-modprobe
# then you will be prompted to disable Secure Boot. Select Disable.
#2
# Reboot your machine but enter BIOS to disable Secure Boot. Typically you can enter BIOS by hitting F12 rapidly as soon as the system restarts.
#3
# Check the installation with the nvidia-smi command
cd
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
# extract the runfile into 3 components(3 runfile installers: NVIDIA-Linux-x86_64-384.81.run (1. NVIDIA driver that we ignore), cuda-linux.9.0.176-22781540.run (2. CUDA 9.0 installer), and cuda-samples.9.0.176-22781540-linux.run (3. CUDA 9.0 Samples))
chmod +x cuda_9.0.176_384.81_linux-run
./cuda_9.0.176_384.81_linux-run --extract=$HOME
# Install cuda toolkit
sudo ./cuda-linux.9.0.176-22781540.run
# Install sample tests
sudo ./cuda-samples.9.0.176-22781540-linux.run
# Configure runtime library
sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig
# Add :/usr/local/cuda/bin (including the ":") at the end of the PATH="/blah:/blah/blah" string
sudo vim /etc/environment
# Test installation
cd /usr/local/cuda-9.0/samples
sudo make
cd /usr/local/cuda/samples/bin/x86_64/linux/release
./deviceQuery
The result of running deviceQuery
should look something like this:
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce GTX 1060"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6073 MBytes (6367739904 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1671 MHz (1.67 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
Install cuDNN 7.0
The recommended way to install cuDNN 7.0 is to download all 3 .deb files. I had previously recommended using the .tgz installation approach, but found out that it didn’t allow verification by running code samples (no way to install the code samples .deb after .tgz installation).
The following steps are pretty much the same as the installation guide using .deb files (strange that the cuDNN guide is better than the CUDA one).
- Go to the cuDNN download page (need registration) and select the latest cuDNN 7.0.* version made for CUDA 9.0.
- Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04.
- In your download folder, install them in the same order:
$ sudo dpkg -i libcudnn7_7.0.5.15–1+cuda9.0_amd64.deb (the runtime library),
$ sudo dpkg -i libcudnn7-dev_7.0.5.15–1+cuda9.0_amd64.deb (the developer library), and
$ sudo dpkg -i libcudnn7-doc_7.0.5.15–1+cuda9.0_amd64.deb (the code samples).
Now we can verify the cuDNN installation (below is just the official guide, which surprisingly works out of the box):
- Copy the code samples somewhere you have write access:
cp -r /usr/src/cudnn_samples_v7/ ~
. - Go to the MNIST example code:
cd ~/cudnn_samples_v7/mnistCUDNN
. - Compile the MNIST example:
make clean && make
. - Run the MNIST example:
./mnistCUDNN
. If your installation is successful, you should seeTest passed!
at the end of the output.
Do NOT Install cuda-command-line-tools
Contrary to the official TensorFlow installation docs, you don’t need to install cuda-command-line-tools
because it’s already installed in this version of CUDA and cuDNN. If you apt-get
it, you won’t find it.
Configure the CUDA and cuDNN library paths
What you do need to do, however, is exporting environment variables LD_LIBRARY_PATH in your .bashrc file:
# put the following line in the end or your .bashrc file
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64"
And source it by source ~/.bashrc
.
NCCL
download Debian pkg from https://developer.nvidia.com/nccl/nccl-download, and follow the instructions on https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html .
2. Install Caffe2 source
Step 1. Update apt package index$ sudo apt-get update
Step 2. Install apt package dependencies
sudo apt-get install -y --no-install-recommends build-essential cmake git libgoogle-glog-dev libgtest-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libopenmpi-dev libsnappy-dev libprotobuf-dev openmpi-bin openmpi-doc protobuf-compiler python-dev python-pip
Step 3. Install pip dependencies
$ sudo pip install --upgrade pip
$ sudo pip install setuptools future numpy protobuf
$ sudo apt-get install -y --no-install-recommends libgflags-dev
Step 4. Clone Caffe 2 into a local directory
Note: We create a directory named caffe2-pytorch and clone Pytorch git repository into this directory.
$ mkdir caffe2-pytorch && cd caffe2-pytorch
$ git clone --recursive https://github.com/pytorch/pytorch.git ./
$ git submodule update --init
Step 5. Build Caffe 2
$ mkdir build && cd build
$ cmake ..
$ sudo make -j"$(nproc)" install
Note: when building the source, we supply the -j flag. The flag refers to the number of threads that can be spawned by the compiler (i.e. GCC) when building the source code. The nproc command itself will print the number of the CPUs available. In short, we want to speed up the compilation time by creating a number of threads that amount to the number of CPUs to perform the compilation in parallel.
Step 6. Create the symbolic link for Caffe 2 shared library$ sudo ldconfig
Step 7. Verify that Caffe 2 library and the headers have been installed in the proper directory
– Update the locate database$ sudo updatedb
– Ensure that libcaffe2.so is located in /usr/local/lib$ locate libcaffe2.so
– Ensure that the header files have been copied into /usr/local/include$ locate caffe2 | grep /usr/local/include/caffe2
Step 8. Add Caffe2 into python and library paths so that it can be immediately discovered and linked by other applications.$ vim ~/.profile
…
# set python path
if [ -z “$PYTHONPATH” ]; then
PYTHONPATH=/usr/local
else
PYTHONPATH=/usr/local:$PYTHONPATH
fi
#set library path
if [ -z “$LD_LIBRARY_PATH” ]; then
LD_LIBRARY_PATH=/usr/local/lib
else
LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
fi
…
We can also set the library paths using parameter expansion as follows:
...
PYTHONPATH=/usr/local${PYTHONPATH:+:${PYTHONPATH}}
LD_LIBRARY_PATH=/usr/local/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
...
We then immediately export the new paths by invoking the source command as follows.$ source ~/.profile
Alternatively, we can use the dot (.) command, to execute ~/.profile file and enable the new environment variables in the current shell.$ . ~/.profile
Step 9. Verify that the Caffe2 python module can be properly invoked$ python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Step 10. Verify that Caffe2 can run with GPU support$ python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
Troubleshooting
* caffe2-pytorch/third_party/ideep/mkl-dnn/external/mklml_lnx_2019.0.3.20190220/lib/libmklml_intel.so: file not recognized: File truncated
checked cmake output, found that "MKL library not found", fixed this buy installing "mkl-include" pkg using conda install mkl-include or pip install mkl-include
* No module named 'google.protobuf'
Simply install it with pip