
💡💡💡本文摘要:基于YOLOv8的SAR图像目标检测系统,覆盖数据制作、数据可视化、模型训练/评估/推理/部署全流程,最后通过 Gradio 界面进行展示。

0 写在前面

本次分享将带领大家熟练掌握 YOLOv8 的使用,并根据自己的任务训练一个特定场景的检测器,本文将重点讲解 YOLOv8 训练框架中数据集的格式、配置文件等细节,让小白少走弯路,跟着走就能轻松训练好自己的检测器,并基于 Gradio 搭建一个简单的应用。

YOLOv8 SAR 目标检测

1 环境准备

首先我们先要准备好本地 Python 环境,有关 Windows 和 Linux 下如何配置虚拟环境,可参考笔者之前的分享:

conda create -n sar python=3.8
conda activate sar

在本地新建好虚拟环境之后,就可以把 YOLOv8 装上了。官方提供了两种下载安装方式:

  • 方式1:pip 源安装
# 方式1:pip源安装
pip install ultralytics
# 如果要使用最新版,可以采用如下方式
pip install git+https://github.com/ultralytics/ultralytics.git@main
  • 方式2:源码安装(推荐)
git clone https://github.com/ultralytics/ultralytics
cd ultralytics
pip install -e .


2 YOLOv8 初体验

这里主要是参考了 YOLOv8 的官方文档,文档结构非常清晰,不过是英文的,对小白来说不太友好,这里笔者将其中开发中最常用的功能摘出来给大家做一个梳理,按照这个流程走,你就能快速训好你的检测器。

2.1 模型训练

YOLOv8 做了非常好的封装,基本在 10 行代码以内就能完成模型训练、评估、推理和导出等常用功能。
我们以加载 YOLOv8 的最小版本 yolov8n 为例:

from ultralytics import YOLO
model = YOLO('yolov8n.yaml') # 会调用ultralytics/cfg/models/v8/yolov8.yaml 并加载 scale='n'
model = YOLO('yolov8n.pt') # 会加载预训练模型,如果没有 默认下载到当前目录

接下来调用 model.train() 函数开始进行模型训练:

results = model.train(data='coco128.yaml', batch=4, epochs=1)

model.train() 函数中的参数说明如下:

  • data=‘coco128.yaml’,数据集配置文件,默认在ultralytics/cfg/datasets/coco128.yaml,其中的数据集会默认下载到 ../datasets/coco128/
  • batch=4, 指定 batchsize 大小
  • device=[0, 1], 指定 gpu 设备
  • resume=True,恢复训练,会自动从 .pt 文件中加载
  • 更多训练参数的默认设置,可参考官方文档 Train
    训练结束后的模型权重结果保存在当前目录下 runs/detect/train

2.2 模型评估

模型评估同样只需要一行代码,调用 model.val() 函数:

# 加载模型参数文件
model = YOLO('runs/detect/train/weights/best.pt')
# 指定评估数据集 data='coco8.yaml'
results = model.val()

model.val() 函数中的更多参数说明可参考官方文档 Val

评估结果保存在当前目录下 runs/detect/val

2.3 模型推理

模型推理同样只需要一行代码,不过输出结果中内容较为丰富,这是因为 YOLOv8 不仅只能完成检测这一任务,这里我们将 results 中的结果打印出来看看,加深对输出结果的认识。

from ultralytics import YOLO
model = YOLO('runs/detect/train/weights/best.pt')
# results = model('https://ultralytics.com/images/bus.jpg')
results = model('bus.jpg')
for result in results:
    boxes = result.boxes  # 目标检测框
    masks = result.masks  # 实例分割结果,这里没有
    keypoints = result.keypoints  # 关键点检测结果,这里没有
    probs = result.probs  # 目标框对应的置信度得分
    result.show()  # display to screen
    result.save(filename='result.jpg')  # save to disk

模型推理函数中的更多参数说明可参考官方文档 Predict

2.4 模型导出

模型导出同样只需要一行代码,调用 model.export() 函数,模型导出类型有'onnx', 'torchscript', 'tensorflow',paddle等常见类型。

导出前需要先按照 ONNX 包:pip install onnx,然后执行如下脚本:

from ultralytics import YOLO
model = YOLO('runs/detect/train/weights/best.pt')
# Export the model to ONNX format
success = model.export(format='onnx')

导出后 .onnx 文件会保存在同级目录下,比如 runs/detect/train/weights/best.onnx

模型导出函数中的更多参数说明可参考官方文档 Export


  • 首先是在 GPU 上的推理:
from ultralytics.utils.benchmarks import benchmark
benchmark(model='runs/detect/train/weights/best.pt', data='coco8.yaml', imgsz=640, half=False, device=0)

过程中如果缺少依赖的包,会自动下载安装,比如 'onnxruntime-gpu' 'nvidia-tensorrt' ‘tensorflow’,比如在我的 NVIDIA GeForce RTX 2050 4G 显卡上的测试结果如下:

可以看到转成 ONNX 推理速度还是快很多的。

  • 再测试下 CPU 下的推理:
from ultralytics.utils.benchmarks import benchmark
benchmark(model='runs/detect/train/weights/best.pt', data='coco8.yaml', imgsz=640, half=False)

过程中如果缺少依赖的包,会自动下载安装,比如 'onnxruntime',测试结果如下:

2.5 更多…

YOLOv8 更多支持的检测模型可在官方文档 Model找到。在本地项目中:

  • 对应的配置文件在:ultralytics/cfg/models/
  • 对应的代码在:ultralytics/models/

同样,更多支持的数据集可在官方文档 Datasets找到。在本地项目中:

  • 对应的配置文件在:ultralytics/cfg/datasets/
  • 对应的代码在:ultralytics/data/

3 训练自己的检测器

这一部分开始,让我们动手在自己的数据集上训练一个 YOLOv8 检测器吧。项目源码我放在了这里,供有需要的同学参考。

3.1 数据集准备

YOLOv8 对数据集的格式要求以及目录结构和我们之前所了解的 COCO 和 VOC 都不同,比如官方提供的 coco8 数据集的目录示例如下:

├── images
│   ├── train
│   │   ├── 000000000009.jpg
│   └── val
│       ├── 000000000036.jpg
└── labels
    ├── train
    │   ├── 000000000009.txt
    ├── val
    │   ├── 000000000036.txt

总结而言,为 YOLOv8 创建数据集共可以分为以下三步:

  • 创建 .yaml 配置文件,可以参考 coco128.yaml
  • 创建标签文件:每张图片对应一个 .txt,如果没有目标,则不需要 .txt; 要求:
    • 每行一个目标
    • 格式 class x_center y_center width height
    • 其中 class 从0开始,坐标是归一化的 (from 0 to 1)
  • 组织数据集文件夹,格式如下:

下面我将以一个 SAR图像舰船目标检测数据集 为例,带领大家走一遍数据集制作的过程。

如果你在本地没有数据,我已经将数据集上传到 AI Studio 平台了,直接下载到本地即可。

3.2 创建标签文件&组织数据集文件夹

接下来我们需要进行数据转换,转换成 YOLOv8 指定的格式,代码我放在了项目源码根目录下convert_labels.py,具体实现逻辑如下:

import os
import json
import shutil
import cv2
import numpy as np
from collections import defaultdict
from ultralytics.utils import LOGGER, TQDM

# Create dataset directory
orig_dir = '../../datasets/ssdd'
save_dir = '../../datasets/ssdd_yolo'
for p in f'{save_dir}/labels', f'{save_dir}/images':
    os.makedirs(p, exist_ok=True)

for json_file in ['train.json', 'val.json']:
    lname = json_file.split('.')[0]
    img_dir = f'{save_dir}/images/{lname}'
    os.makedirs(img_dir, exist_ok=True)
    fn = f'{save_dir}/labels/{lname}'
    os.makedirs(fn, exist_ok=True)
    with open(f'{orig_dir}/{json_file}') as f:
        data = json.load(f)
    images = {f'{x["id"]:d}': x for x in data["images"]}
    imgToAnns = defaultdict(list)
    for ann in data["annotations"]:
    image_txt = []
    # Write labels file
    for img_id, anns in TQDM(imgToAnns.items(), desc=f"Annotations {json_file}"):
        img = images[f"{img_id:d}"]
        h, w = img["height"], img["width"]
        f = img["file_name"]
        shutil.copy(f'{orig_dir}/JPEGImages/{f}', f'{img_dir}/{f}')
        bboxes = []
        for ann in anns:
            box = np.array(ann["bbox"], dtype=np.float64)
            box[:2] += box[2:] / 2  # xy top-left corner to center
            box[[0, 2]] /= w  # normalize x
            box[[1, 3]] /= h  # normalize y
            if box[2] <= 0 or box[3] <= 0:  # if w <= 0 and h <= 0
            cls = ann["category_id"] - 1
            box = [cls] + box.tolist()
            if box not in bboxes:
        with open(f'{fn}/{f[:-3]}txt', 'a') as file:
            for i in range(len(bboxes)):
                line = ' '.join([str(n) for n in bboxes[i]])
                file.write(line + "\n")

LOGGER.info(f"COCO data converted successfully.\nResults saved to {save_dir}")


# check converted annos
img_path = f'{save_dir}/images/train/000031.jpg'
txt_path = f'{save_dir}/labels/train/000031.txt'
lines = open(txt_path, 'r').read().splitlines()
img = cv2.imread(img_path)
ih, iw = img.shape[:2]
for line in lines:
    c, x, y, w, h = [float(i) for i in line.split(' ')]
    x1, y1 = int((x-w/2)*iw), int((y-h/2)*ih)
    x2, y2 = int((x+w/2)*iw), int((y+h/2)*ih)
    cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0))
cv2.imwrite('0.jpg', img)


3.3 创建 ssdd.yaml 数据集配置文件


# ssdd.yaml
path: ../../datasets/ssdd_yolo # dataset root dir
train: images/train # train images (relative to 'path') 
val:  images/val # val images (relative to 'path')
test: images/val 

# Classes
  0: ship
download: |

3.4 数据集上传 ultralytics.hub (可选)

此外,我们还可以将自己的数据集上传到 ultralytics.hub ,分享给更多的社区小伙伴。


cd ~/datasets/
zip -r -o ssdd_yolo.zip ssdd_yolo/


from ultralytics.hub import check_dataset


进入 ultralytics.hub 后需要先注册一个账号,然后点击右上角的 Upload 开始上传。

数据集右上侧 三个点 -> Share,将数据集公开,就可以生成数据集的分享链接。点击数据集,可以看到有关数据集的统计数据,例如在 Train 中共有2009个舰船目标。

3.5 模型训练


from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model.train(data='ultralytics/cfg/datasets/ssdd.yaml', batch=4, epochs=10)

这里我根据自己的 GPU 显存选择了batchsize=4,大家可以根据自己的显存大小进行调整,以免显存溢出:

训练结果展示:训练了 10 个 epoch,mAP50 = 0.956,结果 OK,接下来就是部署成应用了。

4 模型部署和应用搭建

4.1 ONNX 模型转换

考虑到 ONNX 模型的通用性,这里选择 ONNX 模型进行部署。首先将训好的模型转换成 ONNX 格式:

from ultralytics import YOLO
model = YOLO('runs/detect/train2/weights/best.pt')
success = model.export(format='onnx')

4.2 编写推理函数

需要先安装 onnxruntime 包。如果上面已经跑过 benchmark 测试了,那么 onnxruntime 已经安装好了。推理函数放在项目源码根目录下demo.py,供大家参考。

import cv2
import time
import numpy as np
import onnxruntime
import gradio as gr
from ultralytics.utils.ops import xywh2xyxy

class_names = ['ship']
colors = np.random.uniform(0, 255, size=(len(class_names), 3))

def compute_iou(box, boxes):
    # Compute xmin, ymin, xmax, ymax for both boxes
    xmin = np.maximum(box[0], boxes[:, 0])
    ymin = np.maximum(box[1], boxes[:, 1])
    xmax = np.minimum(box[2], boxes[:, 2])
    ymax = np.minimum(box[3], boxes[:, 3])

    # Compute intersection area
    intersection_area = np.maximum(0, xmax - xmin) * np.maximum(0, ymax - ymin)

    # Compute union area
    box_area = (box[2] - box[0]) * (box[3] - box[1])
    boxes_area = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    union_area = box_area + boxes_area - intersection_area

    # Compute IoU
    iou = intersection_area / union_area

    return iou

def nms(boxes, scores, iou_threshold):
    # Sort by score
    sorted_indices = np.argsort(scores)[::-1]

    keep_boxes = []
    while sorted_indices.size > 0:
        # Pick the last box
        box_id = sorted_indices[0]

        # Compute IoU of the picked box with the rest
        ious = compute_iou(boxes[box_id, :], boxes[sorted_indices[1:], :])

        # Remove boxes with IoU over the threshold
        keep_indices = np.where(ious < iou_threshold)[0]

        # print(keep_indices.shape, sorted_indices.shape)
        sorted_indices = sorted_indices[keep_indices + 1]

    return keep_boxes

def multiclass_nms(boxes, scores, class_ids, iou_threshold):

    unique_class_ids = np.unique(class_ids)

    keep_boxes = []
    for class_id in unique_class_ids:
        class_indices = np.where(class_ids == class_id)[0]
        class_boxes = boxes[class_indices,:]
        class_scores = scores[class_indices]

        class_keep_boxes = nms(class_boxes, class_scores, iou_threshold)

    return keep_boxes

def draw_detections(image, boxes, scores, class_ids, mask_alpha=0.3):
    det_img = image.copy()

    img_height, img_width = image.shape[:2]
    font_size = min([img_height, img_width]) * 0.0006
    text_thickness = int(min([img_height, img_width]) * 0.001)

    det_img = draw_masks(det_img, boxes, class_ids, mask_alpha)

    # Draw bounding boxes and labels of detections
    for class_id, box, score in zip(class_ids, boxes, scores):
        color = colors[class_id]

        draw_box(det_img, box, color)

        label = class_names[class_id]
        caption = f'{label} {int(score * 100)}%'
        draw_text(det_img, caption, box, color, font_size, text_thickness)

    return det_img

def detections_dog(image, boxes, scores, class_ids, mask_alpha=0.3):
    det_img = image.copy()

    img_height, img_width = image.shape[:2]
    font_size = min([img_height, img_width]) * 0.0006
    text_thickness = int(min([img_height, img_width]) * 0.001)

    # det_img = draw_masks(det_img, boxes, class_ids, mask_alpha)

    # Draw bounding boxes and labels of detections

    for class_id, box, score in zip(class_ids, boxes, scores):

        color = colors[class_id]

        draw_box(det_img, box, color)
        label = class_names[class_id]
        caption = f'{label} {int(score * 100)}%'
        draw_text(det_img, caption, box, color, font_size, text_thickness)

    return det_img

def draw_box( image, box, color=(0, 0, 255), thickness=2):
    x1, y1, x2, y2 = box.astype(int)
    return cv2.rectangle(image, (x1, y1), (x2, y2), color, thickness)

def draw_text(image, text, box, color=(0, 0, 255), font_size=0.001, text_thickness=2):
    x1, y1, x2, y2 = box.astype(int)
    (tw, th), _ = cv2.getTextSize(text=text, fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                                  fontScale=font_size, thickness=text_thickness)
    th = int(th * 1.2)

    cv2.rectangle(image, (x1, y1),
                  (x1 + tw, y1 - th), color, -1)

    return cv2.putText(image, text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, font_size, (255, 255, 255), text_thickness, cv2.LINE_AA)

def draw_masks(image: np.ndarray, boxes: np.ndarray, classes: np.ndarray, mask_alpha: float = 0.3) -> np.ndarray:
    mask_img = image.copy()

    # Draw bounding boxes and labels of detections
    for box, class_id in zip(boxes, classes):
        color = colors[class_id]

        x1, y1, x2, y2 = box.astype(int)

        # Draw fill rectangle in mask image
        cv2.rectangle(mask_img, (x1, y1), (x2, y2), color, -1)

    return cv2.addWeighted(mask_img, mask_alpha, image, 1 - mask_alpha, 0)

class YOLOV8Det:
    def __init__(self, path, conf_thre=0.5, iou_thre=0.5):
        self.conf_threshold = conf_thre
        self.iou_threshold = iou_thre

        # Initialize model

    def __call__(self, image):
        return self.detect_objects(image)

    def initialize_model(self, path):
        self.session = onnxruntime.InferenceSession(path,providers=onnxruntime.get_available_providers())
        # Get model info

    def detect_objects(self, image):
        input_tensor = self.prepare_input(image)

        # Perform inference on the image
        outputs = self.inference(input_tensor)

        self.boxes, self.scores, self.class_ids = self.process_output(outputs)

        return self.boxes, self.scores, self.class_ids

    def prepare_input(self, image):
        self.img_height, self.img_width = image.shape[:2]

        input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        # Resize input image
        input_img = cv2.resize(input_img, (self.input_width, self.input_height))

        # Scale input pixel values to 0 to 1
        input_img = input_img / 255.0
        input_img = input_img.transpose(2, 0, 1)
        input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)

        return input_tensor

    def inference(self, input_tensor):
        start = time.perf_counter()
        outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})

        # print(f"Inference time: {(time.perf_counter() - start)*1000:.2f} ms")
        return outputs

    def process_output(self, output):
        predictions = np.squeeze(output[0]).T

        # Filter out object confidence scores below threshold
        scores = np.max(predictions[:, 4:], axis=1)
        predictions = predictions[scores > self.conf_threshold, :]
        scores = scores[scores > self.conf_threshold]

        if len(scores) == 0:
            return [], [], []

        # Get the class with the highest confidence
        class_ids = np.argmax(predictions[:, 4:], axis=1)

        # Get bounding boxes for each object
        boxes = self.extract_boxes(predictions)

        # Apply non-maxima suppression to suppress weak, overlapping bounding boxes
        # indices = nms(boxes, scores, self.iou_threshold)
        indices = multiclass_nms(boxes, scores, class_ids, self.iou_threshold)

        return boxes[indices], scores[indices], class_ids[indices]

    def extract_boxes(self, predictions):
        # Extract boxes from predictions
        boxes = predictions[:, :4]

        # Scale boxes to original image dimensions
        boxes = self.rescale_boxes(boxes)

        # Convert boxes to xyxy format
        boxes = xywh2xyxy(boxes)

        return boxes

    def rescale_boxes(self, boxes):

        # Rescale boxes to original image dimensions
        input_shape = np.array([self.input_width, self.input_height, self.input_width, self.input_height])
        boxes = np.divide(boxes, input_shape, dtype=np.float32)
        boxes *= np.array([self.img_width, self.img_height, self.img_width, self.img_height])
        return boxes

    def draw_detections(self, image, draw_scores=True, mask_alpha=0.4):

        return detections_dog(image, self.boxes, self.scores,
                               self.class_ids, mask_alpha)

    def get_input_details(self):
        model_inputs = self.session.get_inputs()
        self.input_names = [model_inputs[i].name for i in range(len(model_inputs))]

        self.input_shape = model_inputs[0].shape
        self.input_height = self.input_shape[2]
        self.input_width = self.input_shape[3]

    def get_output_details(self):
        model_outputs = self.session.get_outputs()
        self.output_names = [model_outputs[i].name for i in range(len(model_outputs))]

接下来,让我们用 Gradio 写一个前端界面,简单搭建一个应用吧:

注意要先安装 gradio:pip install gradio

def predict_image(img, conf_thre, iou_thre):
    predictor = YOLOV8Det('runs/detect/train2/weights/best.onnx', conf_thre, iou_thre)
    out = predictor.draw_detections(img)
    return out

demo = gr.Interface(
        gr.Image(label="Upload Image"),
        gr.Slider(minimum=0, maximum=1, value=0.25, label="Confidence threshold"),
        gr.Slider(minimum=0, maximum=1, value=0.45, label="IoU threshold")
    title="Ultralytics Gradio",
    description="Upload images for inference. The Ultralytics YOLOv8n model is used by default.",
        ["../../datasets/ssdd_yolo/images/train/000031.jpg", 0.25, 0.45],



5 总结

至此,我们完成了 YOLOv8 的一个实战任务,了解了它的数据集组成形式和标签格式,并根据自己的任务训练一个特定场景的检测模型,搭建了一款基于 Gradio 的前端应用。感兴趣的小伙伴赶紧用自己的数据集炼丹吧。

以下是yolov8测试脚本的示例代码: ```python import cv2 import numpy as np import onnxruntime as ort # 加载模型 sess = ort.InferenceSession('models/yolov8s.onnx') # 加载类别名称 with open('yolov8_onnx/coco.names', 'r') as f: class_names = [cname.strip() for cname in f.readlines()] # 定义输入和输出节点名称 input_name = sess.get_inputs()[0].name output_names = [sess.get_outputs()[i].name for i in range(len(sess.get_outputs()))] # 定义预处理函数 def preprocess(img): # 缩放图像 img = cv2.resize(img, (416, 416)) # 转换颜色空间 img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 归一化像素值 img = img.astype(np.float32) / 255.0 # 转换维度顺序 img = np.transpose(img, [2, 0, 1]) # 添加批次维度 img = np.expand_dims(img, axis=0) return img # 定义后处理函数 def postprocess(outputs, conf_thresh=0.5, nms_thresh=0.5): # 解析输出 boxes, scores, classes = [], [], [] for output in outputs: output = np.squeeze(output) num_classes = output.shape[1] - 5 for i in range(output.shape[0]): box = output[i, :4] score = output[i, 4] class_idx = np.argmax(output[i, 5:]) class_score = output[i,5 + class_idx] if score * class_score > conf_thresh: x1, y1, x2, y2 = box boxes.append([x1, y1, x2, y2]) scores.append(score * class_score) classes.append(class_idx) # 非极大值抑制 keep = cv2.dnn.NMSBoxes(boxes, scores, conf_thresh, nms_thresh) # 构建检测结果 results = [] if len(keep) > 0: for idx in keep.flatten(): x1, y1, x2, y2 = boxes[idx] score = scores[idx] class_idx = classes[idx] class_name = class_names[class_idx] results.append({'box': [x1, y1, x2, y2], 'score': score, 'class': class_name}) return results # 加载图像 img = cv2.imread('yolov8_onnx/dog.jpg') # 预处理图像 img = preprocess(img) # 运行模型 outputs = sess.run(output_names, {input_name: img}) # 后处理输出 results = postprocess(outputs) # 显示结果 for result in results: x1, y1, x2, y2 = result['box'] score = result['score'] class_name = result['class'] cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2) cv2.putText(img, f'{class_name} {score:.2f}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) cv2.imshow('result', img) cv2.waitKey(0) cv2.destroyAllWindows() ``` 该脚本可以加载yolov8的onnx模型,并对一张图片进行推理,输出检测结果。在预处理函数中,将图像缩放到416x416大小,并归一化像素值;在后处理函数中,解析模型输出,进行非极大值抑制,并构建检测结果。最后,将检测结果绘制在原图上并显示出来。




