nividia retinanet使用笔记

最新推荐文章于 2024-08-15 22:45:00 发布

库页

最新推荐文章于 2024-08-15 22:45:00 发布

阅读量1.2k

点赞数

分类专栏：深度学习文章标签： Nvidia retinanet

本文链接：https://blog.csdn.net/daniaokuye/article/details/98748835

版权

深度学习专栏收录该内容

61 篇文章 1 订阅

订阅专栏

目的

在Tesla T4上编译nvidia 官方的量化工程retinanet

deb & run conflicting

2.7. Handle Conflicting Installation Methods

Unable to determine the device handle for GPU 0000:B3:00.0:

是设备和主机断开了连接，探究了很久，发现是过热导致的。Tesla整个都是被动散热，youtube上有一些的Tesla散热的视频方法，我是直接加装了笔记本用抽风机，现在看来散热效率不够，温度稳定在84˚。
https://blog.csdn.net/junmuzi/article/details/80707343
update-grub

uninstall

11. Removing CUDA Toolkit and Driver

安装apex

按照官方的流程安装就可以

安装pycuda

参照安装教程
配置configure参数，我的环境时ubuntu18，默认安装的python3.6. 我使用的python3.5，cuda10.1

./configure.py --python-exe=/usr/bin/python --cuda-root=/usr/local/cuda --cudadrv-lib-dir=/usr/lib --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib/x86_64-linux-gnu --boost-python-libname=boost_python-py36  --no-use-shipped-boost

注意，这儿编译源码是python setup.py install后，又有pip install， pytorch编译源码的时候直接setup.py install后就可以。编译安装这儿的弯弯绕还是有好多的啊

docker

unable to evaluate symlinks in Dockerfile path:按流程办事就好 https://github.com/NVIDIA/retinanet-examples
sudo docker build -t retinanet .

安装dali

installation

安装cocoapi

cocoapi pycocotools/_mask.c: No such file or directory
- sudo pip install cython
fatal error: pybind11/pybind11.h: No such file or directory
- 安装pybind或者按文档描述添加路径, 安装教程一步步进行。其中，pybind的cmake会搜索python版本，它的cmake有些复杂，单独在tool中设置了用来搜索的cmake文件。对python来说，修改FindPythonLibsNew.cmake中的${PythonLibsNew_FIND_VERSION}为3.5（或其他）就可以了。

# Use the Python interpreter to find the libs.
if(PythonLibsNew_FIND_REQUIRED)
    find_package(PythonInterp 3.5 REQUIRED )
else()
    find_package(PythonInterp ${PythonLibsNew_FIND_VERSION})
endif()

sudo make install

run

git clone https://github.com/nvidia/retinanet-examples
docker build -t retinanet:latest retinanet/
sudo docker run --gpus ‘“device=1”’ --name=nv_retian --ipc=host -it retinanet:latest
retinanet infer retinanet_rn50fpn.pth --images /home/user/datasets/coco/val2017/ --annotations /home/user/datasets/coco/annotations/instances_val2017.json

undefined symbol: _ZN2cv8fastFreeEPv (cv::fastFree(void*))

怀疑是opencv的锅
https://github.com/NVIDIA/retinanet-examples/issues/38
安装python-opencv后错误会发生改变

E: unable to locate package

可参考url
/etc/apt/sources.list.d 或者在此路径下仿写一个相应的文件

glogs gflags gcc

ubuntu18 默认gcc-7；通过sudo apt install gcc-5来安装
gflags.cc.o: `stderr@@GLIBC_2.2.5’ — — 需要编译gflags时，生成shared文件。这里可以参考一下cmake时超参数的传递。这里是原理，这里是做法. 完整安装过程：https://blog.csdn.net/Amazingren/article/details/81873514
接上面，出现‘gflags’ has not been declared。是因为gflags编译是命名空间是google

cannot find -lopencv_xfeatures2d

how to install opencv
x需要opencv_contrib，how to install opencv & opencv_contrib以及这个csdn的教程
No package ‘gtk±3.0’ found : sudo apt-get install build-essential libgtk-3-dev
No package ‘gstreamer-base-1.0’ found: with the information you gave me I was able to google it. the package that was missing on my system was libgstreamer-plugins-base1.0-dev.
missing: JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH: https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04
Duplicated modules NAMES has been found，contib没有切换分支
补充说明：contirb下载了一些神经网络，比如vgg、face-landmark还有机器学习的库boost
线性函数库engine,安装说明，不用安装

docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].

docker 和nvidia-docker有点区别，docker现在支持了很多nvidia-docker里的东西，所以直接用docker命令就可以。然鹅，毕竟后者是做了一点点调整的（特别是docker19后的二者深度交融），就需要按照流程办事了.
另外，用了docker后，很快的想法就是怎么移动docker的位置。方法在网上都能找的到，就是软链接。但是链接完之后，记得service stop docker再重启一下，不然有其他的问题。nvidia的用户，还需要按上面流程再走一遍。

pytorch的问题

使用pip安装pytorch二进制文件的时候，目前（2019年8月21日）能找到的版本是cuda10.0的pytorch1.1.0（没打算装1.2.0 :-）），实际上呢，如果编译源码的话是，是可以让pytorch支持cuda10.1的。

1. Install Dependencies

pip install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

2. Get the PyTorch Source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
pip install scikit-build --user
pip install ninja --user
git submodule update --init
pip install -U setuptools
pip install -r requirements.txt

3. Install PyTorch

export USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
cd ~/pytorch
python setup.py build
python setup.py install

4. import torch时报from torch._C import *错误

我的情况是编译目录下也有一个torch，估计是安装的时候把路径指向了这里。我猜测重启应该也有效果。我是重命名了这个文件夹之后就可以了。
现在的安装情况是：
torch.version.cuda ‘10.1.243’
torch 1.1.0

test

retinanet infer retinanet_rn50fpn.pth --images  /datasets/coco/val2017/ --annotations /datasets/coco/annotations/instances_val2017.json

train

retinanet train retinanet_rn50fpn.pth --backbone ResNet50FPN \
    --images /datasets/coco/train2017/ --annotations /datasets/coco/annotations/instances_train2017.json \
    --val-images /datasets/coco/val2017/ --val-annotations /datasets/coco/annotations/instances_val2017.json