linux 下ssd训练自己的数据

最新推荐文章于 2024-06-23 00:18:56 发布

babyzbb636

最新推荐文章于 2024-06-23 00:18:56 发布

阅读量729

点赞数

分类专栏：深度学习文章标签： linux ssd

本文链接：https://blog.csdn.net/babyzbb636/article/details/100174697

版权

深度学习专栏收录该内容

30 篇文章 1 订阅

订阅专栏

一准备工作

1 代码地址

2制作voc2007数据

3 解压ssd_300_vgg.ckpt.zip

4下载vgg16

二测试部分

1 创建ssdtest.py

2 对visualization.py修改

三训练部分

1 pascalvoc_common.py

2 pascalvoc_to_tfrecords.py

一准备工作

1 代码地址

https://github.com/balancap/SSD-Tensorflow，下载该代码到本地

2制作voc2007数据

参照博客

https://blog.csdn.net/babyzbb636/article/details/100031102

https://blog.csdn.net/babyzbb636/article/details/100123433

自制目标检测数据集链接数据集1，数据集2，数据集3

3 解压ssd_300_vgg.ckpt.zip

到checkpoint文件夹下

4下载vgg16

作者readme或者链接：https://pan.baidu.com/s/1diWbdJdjVbB3AWN99406nA
密码：ge3x，也放入checkpoint下

二测试部分

Linux下运行ssdtest.py

cd '/home/zbb//SSD/notebooks'

python ssdtest.py

1 创建ssdtest.py

在notebooks中创建ssdtest.py，复制ssd_notebook.ipynb中的代码，在其基础进行修改，完成对于多张图片的测试并保存，并显示目标名称。


# coding: utf-8
import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"

import tensorflow as tf
slim = tf.contrib.slim
import matplotlib.image as mpimg
import sys
sys.path.append('../')
from nets import ssd_vgg_300, ssd_common, np_methods
from preprocessing import ssd_vgg_preprocessing
from notebooks import visualization

# TensorFlow session: grow memory when needed. TF, DO NOT USE ALL MY GPU MEMORY!!!
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(log_device_placement=False, gpu_options=gpu_options)
isess = tf.InteractiveSession(config=config)

# ## SSD 300 Model
# The SSD 300 network takes 300x300 image inputs. In order to feed any image, the latter is resize to this input shape (i.e.`Resize.WARP_RESIZE`). Note that even though it may change the ratio width / height, the SSD model performs well on resized images (and it is the default behaviour in the original Caffe implementation).
# SSD anchors correspond to the default bounding boxes encoded in the network. The SSD net output provides offset on the coordinates and dimensions of these anchors.

# Input placeholder.
net_shape = (300, 300)
data_format = 'NHWC'
img_input = tf.placeholder(tf.uint8, shape=(None, None, 3))
# Evaluation pre-processing: resize to SSD net shape.
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
    img_input, None, None, net_shape, data_format, resize=ssd_vgg_preprocessing.Resize.WARP_RESIZE)
image_4d = tf.expand_dims(image_pre, 0)

# Define the SSD model.
reuse = True if 'ssd_net' in locals() else None
ssd_net = ssd_vgg_300.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format=data_format)):
    predictions, localisations, _, _ = ssd_net.net(image_4d, is_training=False, reuse=reuse)

# Restore SSD model.
#ckpt_filename = '../checkpoints/ssd_300_vgg.ckpt'
# ckpt_filename = '../checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
ckpt_filename = '../train_model/model.ckpt-20000'

isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)

# SSD default anchor boxes.
ssd_anchors = ssd_net.anchors(net_shape)

# ## Post-processing pipeline
# The SSD outputs need to be post-processed to provide proper detections. Namely, we follow these common steps:
# * Select boxes above a classification threshold;
# * Clip boxes to the image shape;
# * Apply the Non-Maximum-Selection algorithm: fuse together boxes whose Jaccard score > threshold;
# * If necessary, resize bounding boxes to original image shape.

# Main image processing routine.
def process_image(img, select_threshold=0.5, nms_threshold=.45, net_shape=(300, 300)):
    # Run SSD network.
    rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
                                                              feed_dict={img_input: img})
    # Get classes and bboxes from the net outputs.
    rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
        rpredictions, rlocalisations, ssd_anchors,
        select_threshold=select_threshold, img_shape=net_shape, num_classes=21, decode=True)

    rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
    rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k=400)
    rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold=nms_threshold)
    # Resize bboxes to original image shape. Note: useless for Resize.WARP!
    rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
    return rclasses, rscores, rbboxes

# Test on some demo image and visualize output.
path = '../demo/'
save_path='../demo_result/'
image_names = sorted(os.listdir(path))

for image_path in image_names:
    img = mpimg.imread(path + image_path)
    rclasses, rscores, rbboxes = process_image(img)
    #visualization.bboxes_draw_on_img(path,image_path, rclasses, rscores, rbboxes, visualization.colors_plasma)
    visualization.plt_bboxes(save_path, image_path, img ,rclasses, rscores, rbboxes)

2 对visualization.py修改

# Copyright 2017 Paul Balanca. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import cv2
import random
import matplotlib.pyplot as plt
import matplotlib.cm as mpcm

#########################add######################################
def num2class(n):
    from datasets import pascalvoc_2007 as pas
    x=pas.pascalvoc_common.VOC_LABELS.items()
    for name,item in x:
        if n in item:
            #print(name)
            return name
#########################add######################################

# =========================================================================== #
# Some colormaps.
# =========================================================================== #
def colors_subselect(colors, num_classes=21):
    dt = len(colors) // num_classes
    sub_colors = []
    for i in range(num_classes):
        color = colors[i*dt]
        if isinstance(color[0], float):
            sub_colors.append([int(c * 255) for c in color])
        else:
            sub_colors.append([c for c in color])
    return sub_colors

colors_plasma = colors_subselect(mpcm.plasma.colors, num_classes=21)
colors_tableau = [(255, 255, 255), (31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
                  (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
                  (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
                  (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
                  (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]


# =========================================================================== #
# OpenCV drawing.
# =========================================================================== #
def draw_lines(img, lines, color=[255, 0, 0], thickness=2):
    """Draw a collection of lines on an image.
    """
    for line in lines:
        for x1, y1, x2, y2 in line:
            cv2.line(img, (x1, y1), (x2, y2), color, thickness)


def draw_rectangle(img, p1, p2, color=[255, 0, 0], thickness=2):
    cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)


def draw_bbox(img, bbox, shape, label, color=[255, 0, 0], thickness=2):
    p1 = (int(bbox[0] * shape[0]), int(bbox[1] * shape[1]))
    p2 = (int(bbox[2] * shape[0]), int(bbox[3] * shape[1]))
    cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)
    p1 = (p1[0]+15, p1[1])
    cv2.putText(img, str(label), p1[::-1], cv2.FONT_HERSHEY_DUPLEX, 0.5, color, 1)

########################把path，image_path传入代替img################################
def bboxes_draw_on_img(path,image_path, classes, scores, bboxes, colors, thickness=2):
    img = cv2.imread(path+image_path)
    shape = img.shape
    for i in range(bboxes.shape[0]):
        bbox = bboxes[i]
        color = colors[classes[i]]
        # Draw bounding box...
        p1 = (int(bbox[0] * shape[0]), int(bbox[1] * shape[1]))
        p2 = (int(bbox[2] * shape[0]), int(bbox[3] * shape[1]))
        cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)
        # Draw text...
        #########################add######################################
        cls_id = int(classes[i])
        class_name = num2class(cls_id)
        s = '%s/%.3f' % (class_name, scores[i])
        #########################add######################################

        #s = '%s/%.3f' % (classes[i], scores[i])
        p1 = (p1[0]-5, p1[1])
        cv2.putText(img, s, p1[::-1], cv2.FONT_HERSHEY_DUPLEX, 0.6,color,thickness=1, lineType=cv2.LINE_AA)
        
        #########################add######################################
        cv2.imwrite('../demo_result/'+image_path, img)
        #########################add######################################

# =========================================================================== #
# Matplotlib show...
# =========================================================================== #

##############################把save_path，path，image_path传入代替img################################
def plt_bboxes(save_path, image_path, img, classes, scores, bboxes, figsize=(10,10), linewidth=1.5):
    """Visualize bounding boxes. Largely inspired by SSD-MXNET!
    """

    fig = plt.figure(figsize=figsize)
    plt.imshow(img)
    ###############################去坐标轴、刻度########################################
    plt.axis('off')
    plt.xticks([])
    plt.yticks([])
    height = img.shape[0]
    width = img.shape[1]
    colors = dict()
    for i in range(classes.shape[0]):
        cls_id = int(classes[i])
        if cls_id >= 0:
            score = scores[i]
            if cls_id not in colors:
                colors[cls_id] = (random.random(), random.random(), random.random())
            ymin = int(bboxes[i, 0] * height)
            xmin = int(bboxes[i, 1] * width)
            ymax = int(bboxes[i, 2] * height)
            xmax = int(bboxes[i, 3] * width)
            rect = plt.Rectangle((xmin, ymin), xmax - xmin,
                                 ymax - ymin, fill=False,
                                 edgecolor=colors[cls_id],
                                 linewidth=linewidth)
            plt.gca().add_patch(rect)

            #class_name = str(cls_id)
            #########################add######################################
            class_name = num2class(cls_id)
            #########################add######################################

            plt.gca().text(xmin, ymin - 2,
                           '{:s} | {:.3f}'.format(class_name, score),
                           bbox=dict(facecolor=colors[cls_id], alpha=0.5),
                           fontsize=12, color='white')
    plt.savefig(save_path + image_path, format='jpg', transparent=True, pad_inches=0, dpi=300,
                    bbox_inches='tight')
    plt.show()

三训练部分

1 pascalvoc_common.py

2 pascalvoc_to_tfrecords.py

将图像数据转换为tfrecods格式，修改datasets文件夹中的pascalvoc_to_tfrecords.py文件

更改文件的83行读取方式为’rb‘，如果你的文件不是

.jpg格式，也可以修改图片的类型

修改67行，可以修改几张图片转为一个tfrecords

linux 运行，在ssd工程下创建tf_conver_data.sh,文件写入内容如下：

DATASET_DIR=./VOC2007/
OUTPUT_DIR=./tfrecords_
python tf_convert_data.py \
    --dataset_name=pascalvoc \
    --dataset_dir=${DATASET_DIR} \
    --output_name=voc_2007_train \
    --output_dir=${OUTPUT_DIR}

报错：

根据博客https://www.e-learn.cn/content/qita/934421说应该是windows和linux下换行符号引起的，然后我copy作者readme

https://github.com/balancap/SSD-Tensorflow下sh内容可以运行

3 训练模型py文件修改

必须：都是修改类别

train_ssd_network.py
eval_ssd_network.py
nets/ssd_vgg_300.py

train_ssd_network.py,也可以修改GPU占用量，学习率，batch_size，迭代数，模型保存间隔等（或者通过训练参数设置）

4 pascalvoc_2007.py

datasets/pascalvoc_2007.py ,根据自己的训练数据修改整个文件

这里的train指的是trainval（包括训练集，验证集，我是通过四个.txt 文件得知）

统计标记框代码：(参照博客https://blog.csdn.net/memories_sunset/article/details/83309417修改，虽然和voc2007数据集对不上，和我的数据集能对上，特意打开labelImg数一下）

这里吐槽一下python对齐，linux下不想下pycharm，一直报错，只好双系统来回关机重启切换，mmp

import re
import os
import xml.etree.ElementTree as ET

def ReadTxtName(rootdir):
    lines = []
    with open(rootdir, 'r') as file_to_read:
        while True:
            line = file_to_read.readline()
            if not line:
                break
            line = line.strip('\n')
            lines.append(line)
    # for im_name in lines:
    # print(im_name)
    return lines

def compute(class_name,NUM,annotation_folder,train_path):

    total_number = [0] * NUM
    total = 0
    total_pic = 0
    pic_num = [0] * NUM
    flag = [0] * NUM
    trainlist = ReadTxtName(train_path)

    for i in range(0, len(trainlist)):
        xml_path = os.path.join(annotation_folder, trainlist[i] + '.xml')
        print(xml_path)
        annotation_file = open(xml_path).read()
        root = ET.fromstring(annotation_file)
        # tree = ET.parse(annotation_file)
        # root = tree.getroot()

        total_pic = total_pic + 1
        for obj in root.findall('object'):
            label = obj.find('name').text
            for i in range(0, len(class_name)):
                if label == class_name[i]:
                    total_number[i] = total_number[i] + 1
                    flag[i] = 1
                    total = total + 1
        for i in range(0, len(class_name)):
            if flag[i] == 1:
                pic_num[i] = pic_num[i] + 1
                # print("pic number:", pic_num1)
                flag[i] = 0

    for i in range(0, len(class_name)):
        print(class_name[i], pic_num[i], total_number[i])
    print("total", total_pic, total)

if __name__ == '__main__':
    '''
    class_name = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
                'dog','horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
    annotation_folder = 'F:/TensorFlow/scene/Faster-RCNN/data/VOCDevkit2007/VOC2007/Annotations/'  # 改为自己标签文件夹的路径
    train_path = 'F:/TensorFlow/scene/Faster-RCNN/data/VOCDevkit2007/VOC2007/ImageSets/Main/trainval.txt'
    NUM = 20
    '''
    class_name = ['cat','plane']
    annotation_folder = '/home/zbb/data/cat+plane/VOC2007/Annotations/'  # 改为自己标签文件夹的路径
    train_path = '/home/zbb/data/cat+plane/VOC2007/ImageSets/Main/trainval.txt'
    NUM = 2
    
    compute(class_name, NUM, annotation_folder, train_path)

5 建立训练脚本

同2中ssd下新建一个train_ssd_network.py.sh文件，文件里写入（第四行不换行或者copy readme下）

DATASET_DIR=./tfrecords/
TRAIN_DIR=./train_model/
CHECKPOINT_PATH=./checkpoints/vgg_16.ckpt
python train_ssd_network.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=pascalvoc_2007 \
    --dataset_split_name=train \
    --model_name=ssd_300_vgg \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --checkpoint_model_scope=vgg_16 \
    --checkpoint_exclude_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
    --trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
    --save_summaries_secs=60 \
    --save_interval_secs=600 \
    --weight_decay=0.0005 \
    --optimizer=adam \
    --learning_rate=0.001 \
    --learning_rate_decay_factor=0.94 \
    --batch_size=64

终于能开始训练了（一直报错，gpu内存不足，我又把batchsize改回32）

6 测试评估

测试同二

Linux下运行ssdtest.py

cd '/home/zbb//SSD/notebooks'

python ssdtest.py

这里一直遇到问题，从刚开的检测不出来，错分类，到后来的乱框，整张图上都在乱框。以为是我的数据集太小可是换成voc也还是不行，效果不好。整个人都崩溃了，看到作者相关中是迭代12万次，试试看可以不。抛弃SSD，loss太大，弄不好

找到相关博客内容如下，博客地址见尾部参考博客

评估

新建eval_ssd_network.sh，写入：

DATASET_DIR=./tfrecords/
EVAL_DIR=./logs/
CHECKPOINT_PATH=./train_model/model.ckpt-120000
python eval_ssd_network.py \
    --eval_dir=${EVAL_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=pascalvoc_2007 \
    --dataset_split_name=test \
    --model_name=ssd_300_vgg \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --batch_size=1

error1：TypeError:_variable_v2_call() got an unexpected keyword argument ‘collections’

解决：这个问题是Tensorflow 版本太高导致的，我用anaconda又新建了个环境，python3.5下，安装tensorflow-gpu，自动下载对应的tensorflow-gpu,cudnn,cuda，是 1.10.0可以使用

erro2：TypeError: Can not convert a tuple into a Tensor or Operation

def flatten(x):
    result = []
    for el in x:
         if isinstance(el, tuple):
               result.extend(flatten(el))
         else:
               result.append(el)
    return result

再把320和340行左右的eval_op=list(names_to_updates.values())改为eval_op=flatten(list(names_to_updates.values()))

结果：