1.搭建环境训练
查看自己的cuda版本,我的是10.0
lyl@lyl:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
查看自己的nvcc -V的版本
```powershell
lyl@lyl:~$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
根据输出判断我的是7.5.0
看算力5.2
GeForce GTX 950M
https://developer.nvidia.com/cuda-gpus#compute
lyl@lyl:~$ lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #6 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
00:1e.0 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO UART #0 (rev 31)
00:1f.0 ISA bridge: Intel Corporation HM170 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 950M] (rev a2)
02:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3165 Plus Bluetooth (rev 99)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet C
创建环境
conda create -n tensorflow-gpu python=3.6
source activate tensorflow-gpu
pip install tensorflow-gpu==1.13.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
用pip安装还是会timed out,用迅雷下载
https://pypi.tuna.tsinghua.edu.cn/packages/63/e5/6f47e0e3b8e9215efb3f41692ab47991d96cb3ccc172cb578435cbcb4959/tensorflow_gpu-1.13.2-cp36-cp36m-manylinux1_x86_64.whl
pip install /home/lyl/anaconda3/envs/tensorflow_gpu-1.13.2-cp36-cp36m-manylinux1_x86_64.whl
pip install keras==2.1.5
安装好环境之后,博主说要重新启动,我就重新启动了一下
https://www.bilibili.com/video/BV1U7411T72r?p=11
然后需要做的是划分数据集
自己划分好数据集之后,运行voc_annotation.py,生成2007_test.txt,2007_train.txt,2007_val.txt,(运行前修改为自己需要的类别,这里不包括背景类)
train.py也要修改类别为自己的类,但是这次需要加上背景类别
然后报错了
错误1
解决 ImportError: libcublas.so.10.0:
https://blog.csdn.net/qq_34374211/article/details/81018320
sudo ldconfig /usr/local/cuda-10.0/lib64
错误2
ModuleNotFoundError: No module named ‘PIL’
https://blog.csdn.net/xiemanR/article/details/53929367
pip install Pillow
错误3
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 97: ordinal not in range(128)
https://blog.csdn.net/qq_35393693/article/details/85307809
import sys
import importlib
importlib.reload(sys)
sys.setdefaultencoding('utf8')
没有解决最后用的替换的方法
错误4
tensorflow 出现 … np_resource = np.dtype([(“resource”, np.ubyte, 1)]) 解决办法
https://blog.csdn.net/qq_19707521
pip install -U numpy==1.16.4
https://blog.csdn.net/shuiyixin/article/details/88928354
https://blog.csdn.net/qq_33440324/article/details/90137596
1/2000 [..............................] - ETA: 20:00:24 - rpn_cls: 7.4168 - rpn_regr: 2.5138 - detector_cls: 1.0986 - detector_regr: 0.0000e+00
2/2000 [..............................] - ETA: 10:36:20 - rpn_cls: 6.4317 - rpn_regr: 2.0820 - detector_cls: 1.0970 - detector_regr: 0.4305
3/2000 [..............................] - ETA: 7:24:26 - rpn_cls: 5.7413 - rpn_regr: 2.6409 - detector_cls: 1.0943 - detector_regr: 0.4783
4/2000 [..............................] - ETA: 5:48:37 - rpn_cls: 5.2946 - rpn_regr: 3.1420 - detector_cls: 1.0917 - detector_regr: 0.5622
5/2000 [..............................] - ETA: 4:51:10 - rpn_cls: 4.9444 - rpn_regr: 3.3616 - detector_cls: 1.0886 - detector_regr: 0.5800
6/2000 [..............................] - ETA: 4:12:37 - rpn_cls: 4.7091 - rpn_regr: 3.3938 - detector_cls: 1.0865 - detector_regr: 0.6232
分类损失和回归的损失都是在不断的减少的
博主说分类损失可能在5左右,detecot_cls在3左右
日志文件保存在这里,后缀都是h5类型的
错误5
Traceback (most recent call last):
File "/home/lyl/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/PIL/JpegImagePlugin.py", line 612, in _save
rawmode = RAWMODE[im.mode]
KeyError: 'RGBA'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/lyl/000_Code/faster-rcnn-keras/get_dr_txt.py", line 162, in <module>
image.save("./input/images-optional/"+image_id+".jpg")
File "/home/lyl/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/PIL/Image.py", line 2134, in save
save_handler(self, fp, filename)
File "/home/lyl/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/PIL/JpegImagePlugin.py", line 614, in _save
raise OSError("cannot write mode %s as JPEG" % im.mode)
OSError: cannot write mode RGBA as JPEG
https://github.com/python-pillow/Pillow/issues/2609
我是先改了pip install pillow==4.2.0,发现没有用,然后改的代码image = image.convert(‘RGB’)
2.prediect.py
frcnn.py修改voc_calsses.txt为自己的数据集
pip install opencv-python
测试单张图片
./ 代表一层目录
…/代表二层目录
比方说,有这样一个相对路径:…/001/002/cp.exe,那么相对cp.exe来说./就是002那个目录,…/就是001那个目录。
按照https://github.com/bubbliiiing/faster-rcnn-keras
https://www.bilibili.com/video/BV1U7411T72r?p=13
的方法
首先要生成demo.py
# !usr/bin/env python
# -*- coding:utf-8 _*-
"""
@Author:Linda Li
@Time:2020/5/7 下午4:56
"""
import os # os:操作系统相关的信息模块
# 存放原始图片地址
data_base_dir = "img/"
# 建立列表,用于保存图片信息
file_list = []
#读取图片文件,并将图片地址、图片名和标签写到txt文件中
write_file_name = 'img/demo.txt'
write_file = open(write_file_name, "w") #以只写方式打开write_file_name demo.txt文件
for file in os.listdir(data_base_dir): # file为current_dir当前目录下图片名
if file.endswith(".jpg"): # 如果file以jpg结尾
write_name = file # 图片路径 + 图片名 + 标签
file_list.append(write_name) #将write_name添加到file_list列表最后
sorted(file_list) #将列表中所有元素随机排列
number_of_lines = len(file_list) #列表中元素个数
#将图片信息写入txt文件中,逐行写入
for current_line in range(number_of_lines):
write_file.write(file_list[current_line].strip('.jpg') + '\n')
#关闭文件
write_file.close()
计算mAP
1.首先生成ground true的每张图片的txt文件
python get_gt_txt.py
2. 生成检测的txt文件
python get_dr_txt.py
这一步的话也比较慢,可能一次运行不成功,在run一次就好了
3.运行mAP
python get_mAP.py
https://blog.csdn.net/weixin_44791964/article/details/104695264
可以说是非常慢了
开始检测时间
结果里面的东西很全,每个值都有保存是非常良心的代码了
如果想用这个代码,最主要的用测试的结果得到txt文件,里面包括执行度的得分和坐标的位置