Mindspore 深度学习框架学习教程(1)——使用mindyolo进行二维码检测及定位

前言

目标:在ubuntu20.04的noetic的ros环境下,利用mindyolo进行对已知大小的二维码进行识别并进行定位。
环境:mindspore == 2.0.0, cuda-nvcc == 11.6, cuda-version == 11.6, cudatoolkit == 11.6, cudnn ==8.0.0.

1 利用conda安装mindspore和cuda环境

首先安在NVIDIA驱动,可以使用nvidia-smi命令后再安装下列环境。

conda create --name mindspore python=3.7
source activate mindspore 
conda install mindspore=2.0.0 -c mindspore -c conda-forge
conda install -c nvidia cuda-nvcc=11.6
conda install cudatoolkit=11.6 -c https://mirrors.ustc.edu.cn/anaconda/pkgs/main/
conda install cudnn=8.8.0 -c https://mirrors.ustc.edu.cn/anaconda/pkgs/main/
export PATH=/home/robot/anaconda3/envs/mindspore/bin:$PATH
export LD_LIBRARY_PATH=/home/robot/anaconda3/pkgs/cudatoolkit-11.6.2-hfc3e2af_12/lib:$PATH
\$LD_LIBRARY_PATH

以上环境安装成功后,可能会有一些其他库也需要安装,运行的时候会弹出报错,利用pip或者conda安装就好。

2 安装mindyolo目标检测算法

我这里推荐源码安装,因为能安装最新的版本,目前我使用的是0.3.0版本。

git clone https://github.com/mindspore-lab/mindyolo.git
cd mindyolo
pip install -r requirements.txt # 安装依赖,注意版本
pip install -e .

3 制作数据

3.1 采集数据

在ros环境下创建mindyolo功能包,并在其src文件下创建capture_image.cpp文件,将下列代码写入:

#include <ros/ros.h>
#include <stdlib.h>
#include <image_transport/image_transport.h>
#include <opencv2/highgui.hpp>
#include <cv_bridge/cv_bridge.h>
#include <string>
#include <fstream>

// 全局变量,用于存储上次保存的图像文件名中的整数部分
int last_saved_index = 0;

// 获取上次保存的图像文件名中的整数部分
void get_last_saved_index(const std::string& filename) {
    std::ifstream file(filename);
    std::string line;
    if (file.is_open()) {
        while (std::getline(file, line)) {
            // 解析文件名,获取整数部分
            // 假设文件名格式为 xxxxxxx.jpg,其中 xxxxxxx 表示整数部分
            std::size_t found = line.find_last_of("/");
            std::size_t found2 = line.find_last_of(".");
            std::string index_str = line.substr(found + 1, found2 - found - 1);
            int index = std::stoi(index_str);
            if (index > last_saved_index) {
                last_saved_index = index;
            }
        }
        file.close();
    }
}

void pic_callback(const sensor_msgs::ImageConstPtr &img_msg) {
    ros::Time time = img_msg->header.stamp;
    std::string filename_prefix = "/home/robot/catkin_ws/src/mindyolo/VOC2007/JEPGImages/";
    
    // 获取上次保存的图像文件名中的整数部分
    get_last_saved_index(filename_prefix + "last_saved.txt");
    
    // 构建新的文件名
    std::string filename_suffix = std::to_string(last_saved_index + 1);
    std::string filename = filename_prefix + filename_suffix + ".jpg";
    
    cv_bridge::CvImageConstPtr ptr;
    ptr = cv_bridge::toCvCopy(img_msg, "bgr8");
    printf("%s\n", filename.c_str());
    cv::imwrite(filename, ptr->image);
    
    // 将本次保存的整数部分写入文件,以备下次使用
    std::ofstream last_saved_file(filename_prefix + "last_saved.txt");
    last_saved_file << filename_suffix;
    last_saved_file.close();
}

int main(int argc, char **argv) {
    ros::init(argc, argv, "img_save");
    ros::NodeHandle n;
    ros::Subscriber sub_image = n.subscribe(argv[1], 1, pic_callback);   
    ros::spin();
    return 0;
}

记得在CMakeLists.txt文件里添加编译环境。

add_executable(capture_image src/capture_image.cpp)
target_link_libraries(capture_image ${catkin_LIBRARIES} ${OpenCV_LIBS})

回到ros的catkin_ws工作空间使用catkin_make即可。

catkin_make
source ~/catkin_ws/devel/setup.sh 
rosrun mindyolo capture_image /camera/left/image_raw #/camera/left/image_raw是你照相机的话题

运行后就可以在目录“/home/robot/catkin_ws/src/mindyolo/VOC2007/JEPGImages/"获得照片。
需要更改目录的话,请更改capture_image.cpp代码里的filename_prefix。

3.2 使用labelimg标注数据

安装labelimg

pip install labelimg -i https://pypi.tuna.tsinghua.edu.cn/simple

然后准备我们需要打标注的数据集。这里我建议新建一个名为VOC2007的文件夹,里面创建一个名为JPEGImages的文件夹存放我们需要打标签的图片文件;再创建一个名为Annotations存放标注的标签文件;最后创建一个名为 predefined_classes.txt 的txt文件来存放所要标注的类别名称。
VOC2007的目录结构为:

├── VOC2007
│├── JPEGImages  存放需要打标签的图片文件
│├── Annotations  存放标注的标签文件
│├── predefined_classes.txt  定义自己要标注的所有类别

具体标注操作方式详见:https://blog.csdn.net/didiaopao/article/details/119808973?spm=1001.2014.3001.5506

3.3 生成voc数据集

在目录/home/robot/mindyolo/examples/code/VOC2007下创建getvoc.py文件:

#getvoc.py
#-*- coding:utf-8 -*-
import os
import random
import numpy as np

xml_file_path = "/home/robot/mindyolo/examples/code/VOC2007/Annotations"
target_path = "/home/robot/mindyolo/examples/code/VOC2007/ImageSets/Main"

# perception
trainval = 0.9     # (num of trainval set) / (num of all samples)
train = 0.8        # (num of train set) / (num of trainval set)

def listdir(path, list_name):
    for file in os.listdir(path):
        file_path = os.path.join(path, file)
        if os.path.isdir(file_path):
            listdir(file_path, list_name)
        elif os.path.splitext(file_path)[1] == '.xml':
            tmp = os.path.splitext(file_path)[0]
            # Extract the numeric part of the file name
            file_name = os.path.basename(tmp)
            numeric_part = ''.join(filter(str.isdigit, file_name))
            list_name.append(int(numeric_part))

file_list = []
listdir(xml_file_path, file_list)

num_of_files = len(file_list)
num_of_trainval = int(trainval * num_of_files)
num_of_train = int(train * num_of_trainval)

# divide samples in trainval and test
trainval_list = np.sort(random.sample(file_list, num_of_trainval))
test_list = np.sort(np.setdiff1d(file_list, trainval_list))

# divide trainval in train and value
train_list = np.sort(random.sample(list(trainval_list), num_of_train))
value_list = np.sort(np.setdiff1d(trainval_list, train_list))

print(trainval_list)
print(test_list)
print(train_list)
print(value_list)

# Save lists to text files
file = open(target_path + '/trainval.txt', 'w')
for var in trainval_list:
    file.write(str(var) + '\n')
file.close()

file = open(target_path + '/test.txt', 'w')
for var in test_list:
    file.write(str(var) + '\n')
file.close()

file = open(target_path + '/train.txt', 'w')
for var in train_list:
    file.write(str(var) + '\n')
file.close()

file = open(target_path + '/val.txt', 'w')
for var in value_list:
    file.write(str(var) + '\n')
file.close()

注意:
我采集的图片是放在了ros工作空间下,这里将数据集复制到了mindyolo下,使用的时候注意文件路径就好,例如xml_file_path和target_path需要修改。
并且,需要在VOC2007文件目录下创建ImageSets文件夹和ImageSets/main文件夹,因为我的程序里面没有直接创建,需要注意一下。

运行脚本就可以在同目录下生成VOC文件了

python3 getvoc.py

3.4 转为yolo格式

在/home/robot/mindyolo/examples/code目录下(和刚刚的VOC2007一个目录就行),这个code目录是我自己创建的,git官方库是没有的,这里是为了方便管理,
这是我的code目录

├── convert_shwd2yolo.py
├── SHWD
│   ├── annotations
│   ├── images
│   ├── labels
│   ├── train.cache.npy
│   ├── train.txt
│   └── val.txt
├── VOC2007
│   ├── Annotations
│   ├── getvoc.py
│   ├── ImageSets
│   ├── JPEGImages
│   └── predefined_classes.txt
├── yolov7-tiny_300e_mAP375-d8972c94.ckpt
└── yolov7-tiny_shwd.yaml

创建voc数据转化yolo数据脚本convert_shwd2yolo.py

#convert_shwd2yolo.py
import os
from pathlib import Path
import argparse
import shutil
import xml.etree.ElementTree as ET
import collections
import json
from tqdm import tqdm


category_set = ['tb0', 'tb1', 'tb2', 'tb3', 'tb4']
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []

category_item_id = 0
annotation_id = 0
image_index = 0


def addCatItem(name):
    global category_item_id
    category_item = collections.OrderedDict()
    category_item['supercategory'] = 'none'
    category_item['id'] = category_item_id
    category_item['name'] = name
    coco['categories'].append(category_item)
    category_item_id += 1


def addImgItem(image_id, size):
    file_name = str(image_id).zfill(8) + '.jpg'
    if not size['width']:
        raise Exception('Could not find width tag in xml file.')
    if not size['height']:
        raise Exception('Could not find height tag in xml file.')

    image_item = collections.OrderedDict()

    image_item['file_name'] = file_name
    image_item['width'] = size['width']
    image_item['height'] = size['height']
    image_item['id'] = image_id
    coco['images'].append(image_item)


def addAnnoItem(image_id, category_id, bbox):
    global annotation_id

    annotation_item = collections.OrderedDict()
    annotation_item['segmentation'] = []

    # segmentation
    seg = []
    # left_top
    seg.append(bbox[0])
    seg.append(bbox[1])
    # left_bottom
    seg.append(bbox[0])
    seg.append(bbox[1] + bbox[3])
    # right_bottom
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1] + bbox[3])
    # right_top
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1])

    annotation_item['segmentation'].append(seg)
    annotation_item['area'] = bbox[2] * bbox[3]
    annotation_item['iscrowd'] = 0
    annotation_item['image_id'] = image_id
    annotation_item['bbox'] = bbox
    annotation_item['category_id'] = category_id
    annotation_item['id'] = annotation_id
    annotation_item['ignore'] = 0
    annotation_id += 1
    coco['annotations'].append(annotation_item)


def xxyy2xywhn(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    x = round(x, 6)
    w = round(w, 6)
    y = round(y, 6)
    h = round(h, 6)
    return x, y, w, h


def xml2txt(xml_path, txt_path):
    in_file = open(xml_path, encoding='utf-8')
    out_file = open(txt_path, 'w', encoding='utf-8')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in category_set or int(difficult) == 1:
            continue
        cls_id = category_set.index(cls)
        xmlbox = obj.find('bndbox')
        x1, x2, y1, y2 = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text),
                          float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))

        # clip x2, y2 to normal range
        if x2 > w:
            x2 = w
        if y2 > h:
            y2 = h

        # xyxy2xywhn
        bbox = (x1, x2, y1, y2)
        bbox = xxyy2xywhn((w, h), bbox)
        out_file.write(str(cls_id) + " " +
                       " ".join([str(a) for a in bbox]) + '\n')


def xml2json(image_index, xml_path):
    bndbox = dict()
    size = dict()
    size['width'] = None
    size['height'] = None

    tree = ET.parse(xml_path)
    root = tree.getroot()

    if root.tag != 'annotation':
        raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))

    # elem format: <folder>, <filename>, <size>, <object>
    for elem in root:
        if elem.tag == 'folder' or elem.tag == 'filename' or elem.tag == 'path' or elem.tag == 'source':
            continue

        elif elem.tag == 'size':
            # add image information, like file_name, size, image_id
            for subelem in elem:
                size[subelem.tag] = int(subelem.text)
            addImgItem(image_index, size)

        elif elem.tag == 'object':
            for subelem in elem:
                if subelem.tag == 'name':
                    object_name = subelem.text
                    current_category_id = category_set.index(object_name)

                elif subelem.tag == 'bndbox':
                    for option in subelem:
                        bndbox[option.tag] = int(option.text)

                    bbox = []
                    bbox.append(bndbox['xmin'])
                    bbox.append(bndbox['ymin'])
                    bbox.append(bndbox['xmax'] - bndbox['xmin'])
                    bbox.append(bndbox['ymax'] - bndbox['ymin'])

                    # add bound box information, include area,image_id, bbox, category_id, id and so on
                    addAnnoItem(image_index, current_category_id, bbox)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--root_dir', default='', type=str, help='root directory of data set')
    opt = parser.parse_args()

    # generate directory structure
    root_dir = opt.root_dir
    #print(root_dir)
    new_dir = os.path.join(root_dir, '..', 'SHWD')
    os.makedirs(os.path.join(new_dir, 'images', 'train'))
    os.makedirs(os.path.join(new_dir, 'images', 'val'))
    os.makedirs(os.path.join(new_dir, 'labels', 'train'))
    os.makedirs(os.path.join(new_dir, 'labels', 'val'))
    os.makedirs(os.path.join(new_dir, 'annotations'))

    train_txt_yolo = open(os.path.join(new_dir, 'train.txt'), 'w')
    val_txt_yolo = open(os.path.join(new_dir, 'val.txt'), 'w')

    images_path = os.path.join(root_dir, 'JPEGImages')
    labels_path = os.path.join(root_dir, 'Annotations')
    #print(images_path)
    #print(labels_path)
    train_set_txt = os.path.join(root_dir, 'ImageSets', 'Main', 'trainval.txt')
    with open(train_set_txt, 'r', encoding='utf-8') as f:
        for line in tqdm(f.readlines(), desc='train_set'):
            stem = line.strip('\n')
            old_path = os.path.join(images_path, stem + '.jpg')
            print(os.path.exists(old_path))
            if not os.path.exists(old_path):
                old_path = os.path.join(images_path, stem + '.JPG')

            # copy train_set image to new path
            new_images_path = os.path.join(new_dir, 'images', 'train')
            shutil.copy(old_path, new_images_path)

            # rename image_file to continuous number
            old_name = Path(old_path).name
            new_stem = str(image_index).zfill(8)
            os.rename(os.path.join(new_images_path, old_name), os.path.join(new_images_path, new_stem + '.jpg'))

            # write the relative path of image to train.txt
            train_txt_yolo.write('./images/train/' + new_stem + '.jpg' + '\n')

            # convert xml file to txt file
            xml_path = os.path.join(labels_path, stem + '.xml')
            txt_path = os.path.join(new_dir, 'labels', 'train', new_stem + '.txt')
            xml2txt(xml_path, txt_path)

            image_index += 1

    val_set_txt = os.path.join(root_dir, 'ImageSets', 'Main', 'test.txt')
    with open(val_set_txt, 'r', encoding='utf-8') as f:
        for line in tqdm(f.readlines(), desc='val_set'):
            stem = line.strip('\n')
            old_path = os.path.join(images_path, stem + '.jpg')
            #print(old_path)
            if not os.path.exists(old_path):
                old_path = os.path.join(images_path, stem + '.JPG')

            # copy val_set image to new path
            new_images_path = os.path.join(new_dir, 'images', 'val')
            shutil.copy(old_path, new_images_path)

            # rename image_file to continuous number
            old_name = Path(old_path).name
            new_stem = str(image_index).zfill(8)
            os.rename(os.path.join(new_images_path, old_name), os.path.join(new_images_path, new_stem + '.jpg'))

            # write the relative path of image to val.txt
            val_txt_yolo.write('./images/val/' + new_stem + '.jpg' + '\n')

            # convert xml file to txt file
            xml_path = os.path.join(labels_path, stem + '.xml')
            txt_path = os.path.join(new_dir, 'labels', 'val', new_stem + '.txt')
            xml2txt(xml_path, txt_path)

            # convert xml file to json file
            xml2json(image_index, xml_path)

            image_index += 1

    for categoryname in category_set:
        addCatItem(categoryname)

    train_txt_yolo.close()
    val_txt_yolo.close()

    # save ground truth json file
    json_file = os.path.join(new_dir, 'annotations', 'instances_val2017.json')
    json.dump(coco, open(json_file, 'w'))

注意:
category_set = [‘tb0’, ‘tb1’, ‘tb2’, ‘tb3’, ‘tb4’]需要更换成自己的标签
如果遇到报错,解决后记得把生成的文件删除后重新运行,否则会报错文件已经存在。

执行脚本convert_shwd2yolo.py,我这里使用的是绝对路径,使用的时候一定要注意更换路径。

python /home/robot/mindyolo/examples/code/convert_shwd2yolo.py --root_dir /home/robot/mindyolo/examples/code/SHWD

4 训练权重

根据自己需求调整配置文件yolov7-tiny_shwd.yaml

__BASE__: [
  '../../configs/yolov7/yolov7-tiny.yaml',
]

per_batch_size: 1 # 16 * 8 = 128 因为我用的4060ti训练,调的比较小,这里是指同时多少张图片进行训练。
img_size: 2048 # image sizes,因为我的摄像机是2048*1536分辨率
#weight: ./yolov7-tiny_pretrain.ckpt
weight: ./examples/code/yolov7-tiny_300e_mAP375-d8972c94.ckpt #预训练权重需要自行下载。
strict_load: False
conf_thres: 0.6 ##这里是指:只显示高于0.6置信度的结果
iou_thres: 0.5 ##这里是指:只显示有0.5的重合度,就是同一个结果。

data:
  dataset_name: shwd
  train_set: ./examples/code/SHWD/train.txt
  val_set: ./examples/code/SHWD/val.txt
  test_set: ./examples/code/SHWD/val.txt
  
  nc: 5##一定要换成自己的类别数量
  # class names
  names: [ 'tb0',  'tb1', 'tb2', 'tb3', 'tb4']##这里一定要换成自己的标签

optimizer:
  lr_init: 0.001  # initial learning rate

在官网https://github.com/mindspore-lab/mindyolo/blob/master/MODEL_ZOO.md下载需要的预训练权重,我这里使用的是yolov7-tiny_300e_mAP375-d8972c94.ckpt
使用mindyolo进行训练,生成的权重默认在/home/robot/mindyolo/runs目录下,指令是:

#在mindyolo目录下执行
python train.py --config  ./examples/code/yolov7-tiny_shwd.yaml --device_target GPU

使用推理进行测试效果,注意路径更换–config后面是配置文件的路径,–weight是权重的路径,–image_path是图片的路径

python demo/predict.py --config ./examples/code/yolov7-tiny_shwd.yaml --weight=./weight/yolov7-tiny_shwd-300_27.ckpt --image_path ./examples/code/SHWD/images/train/00000000.jpg --device_target GPU

效果如下:
在这里插入图片描述

5 通过ros实时检测(工作还在进行中)

在ros的mindyolo工作空间创建yolo_ros.py文件

#/home/robot/catkin_ws/src/mindyolo目录下:
mkdir node
touch yolo_ros.py

yolo_ros.py代码如下:

import argparse
import ast
import math
import os
import sys
import time
import cv2
import numpy as np
import yaml
from datetime import datetime

import mindspore as ms
from mindspore import Tensor, context, nn


from mindyolo.data import COCO80_TO_COCO91_CLASS
from mindyolo.models import create_model
from mindyolo.utils import logger
from mindyolo.utils.config import parse_args
from mindyolo.utils.metrics import non_max_suppression, scale_coords, xyxy2xywh, process_mask_upsample, scale_image
from mindyolo.utils.utils import draw_result, set_seed



import rospy
from sensor_msgs.msg import Image
from ros_numpy.image import image_to_numpy, numpy_to_image
import sys
sys.path.append('/home/robot/mindyolo')

def get_parser_infer(parents=None):
    parser = argparse.ArgumentParser(description="Infer", parents=[parents] if parents else [])
    parser.add_argument("--task", type=str, default="detect", choices=["detect", "segment"])
    parser.add_argument("--device_target", type=str, default="Ascend", help="device target, Ascend/GPU/CPU")
    parser.add_argument("--ms_mode", type=int, default=0, help="train mode, graph/pynative")
    parser.add_argument("--ms_amp_level", type=str, default="O0", help="amp level, O0/O1/O2")
    parser.add_argument(
        "--ms_enable_graph_kernel", type=ast.literal_eval, default=False, help="use enable_graph_kernel or not"
    )
    parser.add_argument("--weight", type=str, default="yolov7_300.ckpt", help="model.ckpt path(s)")
    parser.add_argument("--img_size", type=int, default=640, help="inference size (pixels)")
    parser.add_argument(
        "--single_cls", type=ast.literal_eval, default=False, help="train multi-class data as single-class"
    )
    parser.add_argument("--nms_time_limit", type=float, default=60.0, help="time limit for NMS")
    parser.add_argument("--conf_thres", type=float, default=0.25, help="object confidence threshold")
    parser.add_argument("--iou_thres", type=float, default=0.65, help="IOU threshold for NMS")
    parser.add_argument(
        "--conf_free", type=ast.literal_eval, default=False, help="Whether the prediction result include conf"
    )
    parser.add_argument("--seed", type=int, default=2, help="set global seed")
    parser.add_argument("--log_level", type=str, default="INFO", help="save dir")
    parser.add_argument("--save_dir", type=str, default="./runs_infer", help="save dir")

    parser.add_argument("--image_path", type=str, help="path to image")
    parser.add_argument("--save_result", type=ast.literal_eval, default=True, help="whether save the inference result")

    return parser


def set_default_infer(args):
    # Set Context
    context.set_context(mode=args.ms_mode, device_target=args.device_target, max_call_depth=2000)
    if args.device_target == "Ascend":
        context.set_context(device_id=int(os.getenv("DEVICE_ID", 0)))
    elif args.device_target == "GPU" and args.ms_enable_graph_kernel:
        context.set_context(enable_graph_kernel=True)
    args.rank, args.rank_size = 0, 1
    # Set Data
    args.data.nc = 1 if args.single_cls else int(args.data.nc)  # number of classes
    args.data.names = ["item"] if args.single_cls and len(args.names) != 1 else args.data.names  # class names
    assert len(args.data.names) == args.data.nc, "%g names found for nc=%g dataset in %s" % (
        len(args.data.names),
        args.data.nc,
        args.config,
    )
    # Directories and Save run settings
    platform = sys.platform
    if platform == "win32":
        args.save_dir = os.path.join(args.save_dir, datetime.now().strftime("%Y.%m.%d-%H.%M.%S"))
    else:
        args.save_dir = os.path.join(args.save_dir, datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))
    os.makedirs(args.save_dir, exist_ok=True)
    if args.rank % args.rank_size == 0:
        with open(os.path.join(args.save_dir, "cfg.yaml"), "w") as f:
            yaml.dump(vars(args), f, sort_keys=False)
    # Set Logger
    logger.setup_logging(logger_name="MindYOLO", log_level="INFO", rank_id=args.rank, device_per_servers=args.rank_size)
    logger.setup_logging_file(log_dir=os.path.join(args.save_dir, "logs"))


def detect(
    network: nn.Cell,
    img: np.ndarray,
    conf_thres: float = 0.25,
    iou_thres: float = 0.65,
    conf_free: bool = False,
    nms_time_limit: float = 60.0,
    img_size: int = 640,
    stride: int = 32,
    num_class: int = 80,
    is_coco_dataset: bool = True,
):
    # Resize
    h_ori, w_ori = img.shape[:2]  # orig hw
    r = img_size / max(h_ori, w_ori)  # resize image to img_size
    if r != 1:  # always resize down, only resize up if training with augmentation
        interp = cv2.INTER_AREA if r < 1 else cv2.INTER_LINEAR
        img = cv2.resize(img, (int(w_ori * r), int(h_ori * r)), interpolation=interp)
    h, w = img.shape[:2]
    if h < img_size or w < img_size:
        new_h, new_w = math.ceil(h / stride) * stride, math.ceil(w / stride) * stride
        dh, dw = (new_h - h) / 2, (new_w - w) / 2
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
        img = cv2.copyMakeBorder(
            img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114)
        )  # add border

    # Transpose Norm
    img = img[:, :, ::-1].transpose(2, 0, 1) / 255.0
    imgs_tensor = Tensor(img[None], ms.float32)

    # Run infer
    _t = time.time()
    out = network(imgs_tensor)  # inference and training outputs
    out = out[0] if isinstance(out, (tuple, list)) else out
    infer_times = time.time() - _t

    # Run NMS
    t = time.time()
    out = out.asnumpy()
    
    out = non_max_suppression(
        out,
        conf_thres=conf_thres,
        iou_thres=iou_thres,
        conf_free=conf_free,
        multi_label=True,
        time_limit=nms_time_limit,
    )
    nms_times = time.time() - t
    # logger.info(f"out is: {out}")
    result_dict = {"category_id": [], "bbox": [], "score": []}
    total_category_ids, total_bboxes, total_scores = [], [], []
    for si, pred in enumerate(out):
        if len(pred) == 0:
            continue

        # Predictions
        predn = np.copy(pred)
        scale_coords(img.shape[1:], predn[:, :4], (h_ori, w_ori))  # native-space pred

        box = xyxy2xywh(predn[:, :4])  # xywh
        box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
        category_ids, bboxes, scores = [], [], []
        for p, b in zip(pred.tolist(), box.tolist()):
            category_ids.append(COCO80_TO_COCO91_CLASS[int(p[5])] if is_coco_dataset else int(p[5]))
            bboxes.append([round(x, 3) for x in b])
            scores.append(round(p[4], 5))

        total_category_ids.extend(category_ids)
        total_bboxes.extend(bboxes)
        total_scores.extend(scores)

    result_dict["category_id"].extend(total_category_ids)
    result_dict["bbox"].extend(total_bboxes)
    result_dict["score"].extend(total_scores)

    t = tuple(x * 1e3 for x in (infer_times, nms_times, infer_times + nms_times)) + (img_size, img_size, 1)  # tuple
    logger.info(f"Predict result is: {result_dict}")
    logger.info(f"Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g;" % t)
    logger.info(f"Detect a image success.")

    return result_dict


def infer(args, img):
    # Init
    set_seed(args.seed)
    set_default_infer(args)

    # Create Network
    network = create_model(
        model_name=args.network.model_name,
        model_cfg=args.network,
        num_classes=args.data.nc,
        sync_bn=False,
        checkpoint_path=args.weight,
    )
    network.set_train(False)
    ms.amp.auto_mixed_precision(network, amp_level=args.ms_amp_level)

    # Detect
    is_coco_dataset = "coco" in args.data.dataset_name
    if args.task == "detect":
        result_dict = detect(
            network=network,
            img=img,
            conf_thres=args.conf_thres,
            iou_thres=args.iou_thres,
            conf_free=args.conf_free,
            nms_time_limit=args.nms_time_limit,
            img_size=args.img_size,
            stride=max(max(args.network.stride), 32),
            num_class=args.data.nc,
            is_coco_dataset=is_coco_dataset,
        )

    logger.info("Infer completed.")


def draw_image(im, result_dict, data_names, is_coco_dataset=True):
    import random
    import cv2
    from mindyolo.data import COCO80_TO_COCO91_CLASS

    category_id, bbox, score = result_dict["category_id"], result_dict["bbox"], result_dict["score"]
    seg = result_dict.get("segmentation", None)
    mask = None if seg is None else np.zeros_like(im, dtype=np.float32)
    for i in range(len(bbox)):
        # draw box
        x_l, y_t, w, h = bbox[i][:]
        x_r, y_b = x_l + w, y_t + h
        x_l, y_t, x_r, y_b = int(x_l), int(y_t), int(x_r), int(y_b)
        _color = [random.randint(0, 255) for _ in range(3)]
        cv2.rectangle(im, (x_l, y_t), (x_r, y_b), tuple(_color), 2)
        if seg:
            _color_seg = np.array([random.randint(0, 255) for _ in range(3)], np.float32)
            mask += seg[i][:, :, None] * _color_seg[None, None, :]

        # draw label
        if is_coco_dataset:
            class_name_index = COCO80_TO_COCO91_CLASS.index(category_id[i])
        else:
            class_name_index = category_id[i]
        class_name = data_names[class_name_index]  # args.data.names[class_name_index]
        text = f"{class_name}: {score[i]}"
        (text_w, text_h), baseline = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)
        cv2.rectangle(im, (x_l, y_t - text_h - baseline), (x_l + text_w, y_t), tuple(_color), -1)
        cv2.putText(im, text, (x_l, y_t - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2)

    # save results
    if seg:
        im = (0.7 * im + 0.3 * mask).astype(np.uint8)
    return im


class YoloxRos(object):
    def __init__(self, args):
        self.args = args
        set_seed(self.args.seed)
        set_default_infer(self.args)
        self.network = create_model(
            model_name=self.args.network.model_name,
            model_cfg=self.args.network,
            num_classes=self.args.data.nc,
            sync_bn=False,
            checkpoint_path=self.args.weight,
        )
        self.network.set_train(False)
        ms.amp.auto_mixed_precision(self.network, amp_level=self.args.ms_amp_level)

        self.image_subscriber = rospy.Subscriber('/camera/left/image_raw', Image, callback=self.image_callback, queue_size=1)
        self.image_publisher = rospy.Publisher('/image_publish', Image, queue_size=1)

    def image_callback(self, msg):
        image = image_to_numpy(msg)
        result_image = self.process_image(image)
        self.image_publisher.publish(numpy_to_image(result_image, encoding='bgr8'))

    def process_image(self, img):
        is_coco_dataset = "coco" in self.args.data.dataset_name
        if self.args.task == "detect":
            result_dict = detect(
                network=self.network,
                img=img,
                conf_thres=self.args.conf_thres,
                iou_thres=self.args.iou_thres,
                conf_free=self.args.conf_free,
                nms_time_limit=self.args.nms_time_limit,
                img_size=self.args.img_size,
                stride=max(max(self.args.network.stride), 32),
                num_class=self.args.data.nc,
                is_coco_dataset=is_coco_dataset,
            )
        # 假设图像数据存储在'image'键中
        result_image = draw_image(img, result_dict, args.data.names, is_coco_dataset=is_coco_dataset)
        return result_image


if __name__ == "__main__":
    rospy.init_node('yolox_ros_node')
    parser = get_parser_infer()
    args = parse_args(parser)
    yolox_ros = YoloxRos(args)

    rospy.spin()

运行指令:

python yolo_ros.py --config /home/robot/mindyolo/examples/code/yolov7-tiny_shwd.yaml --weight=/home/robot/mindyolo/weight/yolov7-tiny_shwd-300_27.ckpt --device GPU

实验效果,能够实现实时定位,但精度有待提高。
在这里插入图片描述

参考资料

[1] mindyolo参考教程:https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_SHWD/README.md

  • 46
    点赞
  • 44
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
如果不想使用CoreImage框架,也可以使用ZBarSDK来获取图片中二维码定位。 1. 导入ZBarSDK: 在项目中导入ZBarSDK,并在需要使用的文件中引入头文件: ``` #import "ZBarSDK.h" ``` 2. 加载图片: ``` UIImage *image = [UIImage imageNamed:@"image.png"]; CGImageRef imageRef = image.CGImage; ``` 3. 创建ZBarImageScanner并设置识别类型: ``` ZBarImageScanner *scanner = [[ZBarImageScanner alloc] init]; [scanner setSymbology: ZBAR_QRCODE config: ZBAR_CFG_ENABLE to: 1]; ``` 4. 将图片转换为ZBarImage对象: ``` ZBarImage *zbarImage = [[ZBarImage alloc] initWithCGImage:imageRef]; ``` 5. 扫描二维码: ``` [scanner scanImage:zbarImage]; ``` 6. 遍历扫描结果,获取二维码定位: ``` for (ZBarSymbol *symbol in scanner.results) { NSArray *corners = symbol.bounds; // 获取定位四个角的坐标 // 在图像上绘制定位 UIGraphicsBeginImageContext(imageSize); CGContextRef context = UIGraphicsGetCurrentContext(); CGContextSetStrokeColorWithColor(context, [UIColor greenColor].CGColor); CGContextSetLineWidth(context, 2.0); CGContextMoveToPoint(context, corners[0].x, corners[0].y); CGContextAddLineToPoint(context, corners[1].x, corners[1].y); CGContextAddLineToPoint(context, corners[2].x, corners[2].y); CGContextAddLineToPoint(context, corners[3].x, corners[3].y); CGContextAddLineToPoint(context, corners[0].x, corners[0].y); CGContextStrokePath(context); UIImage *resultImage = UIGraphicsGetImageFromCurrentImageContext(); UIGraphicsEndImageContext(); } ``` 这样就可以在原图上绘制出二维码定位了。需要注意的是,ZBarSDK是第三方框架,需要先在官网注册并获取授权才能使用

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

蝶舞狂雷

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值