FiftyOne初体验（二）

最新推荐文章于 2024-09-30 14:16:32 发布

阿冰眼里有光

最新推荐文章于 2024-09-30 14:16:32 发布

阅读量997

点赞数 21

分类专栏： FiftyOne 文章标签： python 深度学习目标检测 YOLO

本文链接：https://blog.csdn.net/PopliIce_Drunk/article/details/141463663

版权

FiftyOne 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章目录

前言
一、FiftyOne 是什么？
二、代码
三、出现的问题：
- 1.磁盘占用空间不足：
- 2.名字已经被占用：
总结

前言

刚刚入门 YOLO 系列模型，尝试了 YOLOv5 和 YOLOv10，但在处理数据集时遇到了不少麻烦。为了更好地可视化和管理数据，我发现了 FiftyOne 这个工具。FiftyOne 是一个强大的数据探索和可视化平台，它能有效地帮助你处理目标检测任务中的数据。使用 FiftyOne，你可以轻松地可视化图像和标注框，迅速筛选出需要分析的数据子集，无论是按类别、标签还是注释等条件。这让数据集的管理和分析变得更加高效，极大地简化了我在 YOLO 模型训练过程中的工作。

一、FiftyOne 是什么？

FiftyOne 是一个开源的数据探索和可视化工具，专为机器学习和计算机视觉任务设计。它提供了强大的功能来帮助用户：

可视化和探索图像、视频及其标注数据。
以交互式方式查看和管理数据集。
快速筛选、搜索和分析数据子集，根据类别、标签、注释等条件进行过滤。
生成数据集的统计信息和报告，以帮助理解和优化模型性能。
FiftyOne 支持多种数据格式，包括 COCO、Pascal VOC 和自定义格式，旨在简化数据管理和提高工作效率。

二、代码

1、2、3、接上节：FiftyOne初体验（一）

4、调用图像数据集代码，这个代码能显示预测框、标注框、图像信息、置信度筛选

代码如下（示例）：

import fiftyone as fo
import fiftyone.zoo as foz
import os
import xml.etree.ElementTree as ET
from PIL import Image
from pathlib import Path

def read_predictions_from_xml(xml_dir):
    predictions = {}
    for xml_file in os.listdir(xml_dir):
        if xml_file.endswith('.xml'):
            tree = ET.parse(os.path.join(xml_dir, xml_file))
            root = tree.getroot()
            filename = root.find('filename').text
            detections = []
            for obj in root.findall('object'):
                name = obj.find('name').text
                bbox = obj.find('bndbox')
                xmin = float(bbox.find('xmin').text)
                ymin = float(bbox.find('ymin').text)
                xmax = float(bbox.find('xmax').text)
                ymax = float(bbox.find('ymax').text)
                confidence = float(obj.find('confidence').text)  # 假设你的XML文件中有这个字段
                detections.append({
                    'label': name,
                    'bbox': [xmin, ymin, xmax, ymax],
                    'confidence': confidence  # 添加confidence值
                })
            predictions[filename] = detections
    return predictions

def read_labels_from_xml(xml_dir):
    labels = {}
    for xml_file in os.listdir(xml_dir):
        if xml_file.endswith('.xml'):
            tree = ET.parse(os.path.join(xml_dir, xml_file))
            root = tree.getroot()
            filename = root.find('filename').text
            labels[filename] = []
            for obj in root.findall('object'):
                name = obj.find('name').text
                bbox = obj.find('bndbox')
                xmin = float(bbox.find('xmin').text)
                ymin = float(bbox.find('ymin').text)
                xmax = float(bbox.find('xmax').text)
                ymax = float(bbox.find('ymax').text)
                labels[filename].append({
                    'label': name,
                    'bbox': [xmin, ymin, xmax, ymax]
                })
    return labels
    

def create_fiftyone_dataset(image_dir, predictions, labels):
    samples = []
    for filename, detection_list in predictions.items():
        img_path = os.path.join(image_dir, filename)
        if not os.path.exists(img_path):
            continue
        img = Image.open(img_path)
        width, height = img.size
        detections = []
        for detection in detection_list:
            label = detection['label']
            bbox = detection['bbox']
            confidence = detection['confidence']  # 获取confidence值
            xmin, ymin, xmax, ymax = bbox
            detections.append(
                fo.Detection(
                    label=label,
                    bounding_box=[xmin / width, ymin / height, (xmax - xmin) / width, (ymax - ymin) / height],
                    confidence=confidence  # 添加confidence值
                )
            )
        sample = fo.Sample(filepath=img_path)
        sample.metadata = fo.ImageMetadata(width=width, height=height)
        sample["predictions"] = fo.Detections(detections=detections)
        samples.append(sample)
    dataset = fo.Dataset(name="thyroid_dataset_v3")
    dataset.add_samples(samples)
    return dataset
    
def add_label_data_to_dataset(dataset, labels):
    for sample in dataset:
        filename = Path(sample.filepath).name
        if filename in labels:
            metadata = sample.metadata
            width = metadata.width
            height = metadata.height
            label_list = labels[filename]
            detections = []
            for label in label_list:
                label_name = label['label']
                bbox = label['bbox']
                xmin, ymin, xmax, ymax = bbox
                detections.append(
                    fo.Detection(
                        label=label_name,
                        bounding_box=[xmin / width, ymin / height, (xmax - xmin) / width, (ymax - ymin) / height]
                    )
                )
            sample["ground_truth"] = fo.Detections(detections=detections)  # 保持为 "ground_truth"
            sample.save()

# 读取预测和标签数据
prediction_dir = '/home/jovyan/work/xxx/yolov5-5.0/runs/thyroid_error_train/thyroid_images/error_img/3_error_pred_xml'
label_dir = '/home/jovyan/work/xxx/yolov5-5.0/runs/thyroid_error_train/thyroid_images/error_img/xml'
image_dir = '/home/jovyan/work/xxx/yolov5-5.0/runs/thyroid_error_train/thyroid_images/error_img/images'

predictions = read_predictions_from_xml(prediction_dir)
labels = read_labels_from_xml(label_dir)

# 创建 FiftyOne 数据集
dataset = create_fiftyone_dataset(image_dir, predictions, labels)

# 添加标签数据到数据集中
add_label_data_to_dataset(dataset, labels)

# 启动 FiftyOne 可视化
session = fo.launch_app(dataset)
session.wait()  # 官网给的示例没有这一句，记得加上，不然程序不会等待，在网页中看不到我们要的效果

5.标签xml文件举例：

代码如下（示例）：

<annotation>
	<folder>1</folder>
	<filename>H002-room10-WS80A-20230330-137-94200-0_26_09-60.png</filename>
	<path>D:\2024\H002_room10_WS80A_20230823\1\H002-room10-WS80A-20230330-137-94200-0_26_09-60.png</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>1218</width>
		<height>583</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>thyroid_nodule</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>763</xmin>
			<ymin>222</ymin>
			<xmax>797</xmax>
			<ymax>250</ymax>
		</bndbox>
	</object>
</annotation>

6.预测得出的xml文件举例：

代码如下（示例）：

<?xml version="1.0" ?>
<annotation>
  <folder>images</folder>
  <filename>H002-room10-WS80A-20230330-137-94200-0_26_09-60.png</filename>
  <path>/home/jovyan/work/data/thyroid/thyroid_images/images/H002-room10-WS80A-20230330-137-94200-0_26_09-60.png</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>1218</width>
    <height>583</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>thyroid_nodule</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <confidence>0.62646484375</confidence>
    <bndbox>
      <xmin>756</xmin>
      <ymin>220</ymin>
      <xmax>798</xmax>
      <ymax>252</ymax>
    </bndbox>
  </object>
  <object>
    <name>thyroid_nodule</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <confidence>0.537109375</confidence>
    <bndbox>
      <xmin>512</xmin>
      <ymin>155</ymin>
      <xmax>637</xmax>
      <ymax>260</ymax>
    </bndbox>
  </object>
</annotation>

三、出现的问题：

1.磁盘占用空间不足：

磁盘占用空间不足
解决方法：删除磁盘中无关文件，降低磁盘空间

2.名字已经被占用：

名字已经被占用
解决方法：
检查现有数据集：运行以下代码检查当前已经存在的数据集名称：
import fiftyone as fo
列出现有数据集
datasets = fo.list_datasets()
print(datasets)
删除现有数据集：如果你确定不再需要现有的数据集，可以删除它：
fo.delete_dataset(“thyroid_dataset”)
使用不同的名称：如果不想删除现有的数据集，可以使用不同的名称来创建新的数据集：
dataset = fo.Dataset(name=“thyroid_dataset_v2”)
检查数据集是否被挂起：如果你在之前的运行中异常退出，可能导致数据集被挂起。可以尝试重启环境，然后再次尝试运行代码。