NVIDIA Digits is used to interactively train network models on annotated datasets in the cloud or PC
DIGITS is a training platform that can be used with NVCaffe, Torch, and TensorFlow deep learning frameworks. Using any of these frameworks, DIGITS will train your deep learning models on your dataset.
1、INSTALL
使用 NVIDIA-Docker Digits
- 1 安装新版的显卡驱动
- 2 安装nvidia-docker
$ sudo apt-get install -y ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo usermod -aG docker $USER
$ sudo reboot
- 3 安装 Digits 镜像
$ nvidia-docker run --name digits -d -p 8888:5000 \
-v /home/username/data:/data:ro
-v /home/username/digits-jobs:/workspace/jobs nvcr.io/nvidia/digits:18.05
//版本号选当前最新的
- 4 运行一下
本地环境安装方式
- 安装显卡驱动
- 安装CUDA
- 安装cuDNN(与cuda版本匹配)
https://developer.nvidia.com/cudnn
sudo dpkg -i libcudnn<version>_
amd64.deb
sudo dpkg -i libcudnn-dev_<version>
_amd64.deb
- 安装 NVcaffe
git clone -b caffe-0.15 https://github.com/NVIDIA/caffe
cp Makefile.config.example Makefile.config
make all
make test
make runtest
make all 的时候可能会出现一些编译出错
@fatal error: gflags/gflags.h: No such file or directory
sudo apt-get install libgflags-dev
@fatal error: cblas.h: No such file or directory
sudo apt-get install libblas-dev
@fatal error: glog.h: No such file or directory
git clone https://github.com/google/glog
sudo ./autogen.sh && ./configure && make && make install
@fatal error: hdf5.h: No such file or directory
在Makefile.config找到以下行并添加
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial
- 注意GPU的计算能力
2、Digits的使用
启动/关闭
$ nvidia-docker run --name digits -d -p 8888:5000 \
\ -v /home/username/data:/data:ro
\ -v /home/username/digits-jobs:/workspace/jobs nvcr.io/nvidia/digits:18.05
登陆浏览器 localhost:8888
创建一个docker守护进程,container端口5000 host端口8888
挂载一个本地文件夹,用于存数据集, 数据map,只读
挂载一个文件夹,(可写),用于 Digits jobs
如果要使用 NCCL库(NVIDIA Collective Communications Library),需要打开共享内存
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864………………
如果需要新开一个这个容器的终端
docker exec -it digits
基本操作流程
Creat dataset : DIGITS to tell NVCaffe to load the datasets
- image size 为网络输入的尺寸
- 如果真实图片不符合,需要resize 通常有裁剪、填充、拉伸等
数据集加载到digits的方式
- use image folder 如果dataset的文件夹目录结构很清晰,训练集和测试集层次分明,子文件夹目录名为标签的名字,这样可以直接使用image folder来加载数据集
Digit系统会根据文件生成 train.txt val.txt labels.txt就是目录0~9
- use text files
可视化
调节
https://github.com/NVIDIA/DIGITS
https://developer.nvidia.com/digits
https://github.com/dusty-nv/jetson-inference
https://docs.nvidia.com/deeplearning/digits/digits-user-guide/index.html