assert (boxes[:, 2] >= boxes[:, 0]).all()报错

最新推荐文章于 2023-10-18 20:29:08 发布

转载最新推荐文章于 2023-10-18 20:29:08 发布 · 447 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/whu-zeng/p/9066251.html

本文介绍了一段用于处理图像翻转后边界框坐标异常的代码，并详细解释了代码中assert的作用及常见数据错误的原因。

根据报错信息，打印以下内容：

代码如下：

for i in xrange(num_images):
#print ('in append_flipped==================',self.roidb)
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 - 1
boxes[:, 2] = widths[i] - oldx1 - 1
try:
assert (boxes[:, 2] >= boxes[:, 0]).all()
except:
print ('in append_flipped==================',self.roidb[i]['boxes'],boxes[:, 0],boxes[:, 2],widths[i],oldx2)

其中，self.roidb是从标记的框那里得来的。这个代码的目的是旋转图片还能得到框，是一个数据增强的做法。

打印出来后，能看到是哪些异常数据导致这个assert的。这个assert (boxes[:, 2] >= boxes[:, 0]).all()

的意思是：右下角的值应该比左上角的值大。这样才是一个正确的框。

常见的数据错误有：

1）数据越界了。这个跟数据清洗有关。有可能标记的时候。图片的宽只有1000，硬是变成了左上角 990，框的宽度20。加在一起就是1010.越界了。

转载于:https://www.cnblogs.com/whu-zeng/p/9066251.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30619101

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

解决assert（boxes[:,2]>=boxes[:,0]）.all()报错

ptgood的博客

02-15

2669

写这篇主要是写了代码不想浪费用的是github上的tf-faster-rcnn（自己搜下） python3.6.2版本错误：assert（boxes[:,2]>=boxes[:,0]）.all(): boxes是标注的bounding box,检查Xmax是否大于Xmin,只要有一个不符合就报错 ====================== 问题原因：1.数据集有问题2.数据...

[报错]Faster R-CNN：assert (boxes[:, 2] >= boxes[:, 0]).all()问题

tanshuai0620的博客

11-07

383

记录一下~ 最近在给Faster R-CNN换数据集运行，win10版本的，出现了assert (boxes[:, 2] >= boxes[:, 0]).all()问题。出现这个报错除了修改是否-1的问题，如下图： bbox = obj.find('bndbox') # Make pixel indexes 0-based x1 = float(bbox.find('xmin').text...

参与评论您还未登录，请先登录后发表或查看评论

解决faster-rcnn中训练时assert（boxes[:,2]>=boxes[:,0]）.all()的问题

热门推荐

XZZPPP的博客

07-26

2万+

1、出现问题：训练faster rcnn时出现如下报错： File "/py-faster-rcnn/tools/../lib/datasets/imdb.py", line 108, in append_flipped_images assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError 2、问题分析：检查自己数据发现，左

faster-rcnn系列assert (boxes[:, 2] >= boxes[:, 0]).all()和loss偶尔为nan的问题

Sundrops的专栏

11-16

3600

问题综述这些问题的根源都是faster-rcnn系列在处理生成pascal voc数据集时，为了使像素以0为起点，每个bbox的左上右下坐标都减1,如果你的数据里有坐标为0，一般是x1或y1,这时x1 = 0-1 = 65535，就会出现下面的问题1，2。

faster rcnn:assert (boxes[:, 2] >= boxes[:, 0]).all()分析塈VOC2007 xml坐标定义理解

10km的专栏

03-22

8612

在进行faster rcnn训练的时候，报了一个断言错误 File “/py-faster-rcnn/tools/../lib/datasets/imdb.py”, line 108, in append_flipped_images assert (boxes[:, 2] &amp;amp;amp;gt;= boxes[:, 0]).all() AssertionError 参照这篇文章，找到了解决办法

解决faster-rcnn中训练时assert（boxes[:,2]＞=boxes[:,0]）.all()的问题

weixin_43510038的博客

07-18

294

出现问题：训练faster rcnn时出现如下报错： File “/py-faster-rcnn/tools/…/lib/datasets/imdb.py”, line 108, in append_flipped_images assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError 看了很多处理方案，都没有解决。最后最后，我把图片格...

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license from __future__ import annotations from pathlib import Path from typing import Any import torch from ultralytics.engine.model import Model from ultralytics.utils import DEFAULT_CFG_DICT from ultralytics.utils.downloads import attempt_download_asset from ultralytics.utils.patches import torch_load from ultralytics.utils.torch_utils import model_info from .predict import NASPredictor from .val import NASValidator class NAS(Model): """ YOLO-NAS model for object detection. This class provides an interface for the YOLO-NAS models and extends the `Model` class from Ultralytics engine. It is designed to facilitate the task of object detection using pre-trained or custom-trained YOLO-NAS models. Attributes: model (torch.nn.Module): The loaded YOLO-NAS model. task (str): The task type for the model, defaults to 'detect'. predictor (NASPredictor): The predictor instance for making predictions. validator (NASValidator): The validator instance for model validation. Methods: info: Log model information and return model details. Examples: >>> from ultralytics import NAS >>> model = NAS("yolo_nas_s") >>> results = model.predict("ultralytics/assets/bus.jpg") Notes: YOLO-NAS models only support pre-trained models. Do not provide YAML configuration files. """ def __init__(self, model: str = "yolo_nas_s.pt") -> None: """Initialize the NAS model with the provided or default model.""" assert Path(model).suffix not in {".yaml", ".yml"}, "YOLO-NAS models only support pre-trained models." super().__init__(model, task="detect") def _load(self, weights: str, task=None) -> None: """ Load an existing NAS model weights or create a new NAS model with pretrained weights. Args: weights (str): Path to the model weights file or model name. task (str, optional): Task type for the model. """ import super_gradients suffix = Path(weights).suffix if suffix == ".pt": self.model = torch_load(attempt_download_asset(weights)) elif suffix == "": self.model = super_gradients.training.models.get(weights, pretrained_weights="coco") # Override the forward method to ignore additional arguments def new_forward(x, *args, **kwargs): """Ignore additional __call__ arguments.""" return self.model._original_forward(x) self.model._original_forward = self.model.forward self.model.forward = new_forward # Standardize model attributes for compatibility self.model.fuse = lambda verbose=True: self.model self.model.stride = torch.tensor([32]) self.model.names = dict(enumerate(self.model._class_names)) self.model.is_fused = lambda: False # for info() self.model.yaml = {} # for info() self.model.pt_path = weights # for export() self.model.task = "detect" # for export() self.model.args = {**DEFAULT_CFG_DICT, **self.overrides} # for export() self.model.eval() def info(self, detailed: bool = False, verbose: bool = True) -> dict[str, Any]: """ Log model information. Args: detailed (bool): Show detailed information about model. verbose (bool): Controls verbosity. Returns: (dict[str, Any]): Model information dictionary. """ return model_info(self.model, detailed=detailed, verbose=verbose, imgsz=640) @property def task_map(self) -> dict[str, dict[str, Any]]: """Return a dictionary mapping tasks to respective predictor and validator classes.""" return {"detect": {"predictor": NASPredictor, "validator": NASValidator}} # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license import torch from ultralytics.models.yolo.detect.predict import DetectionPredictor from ultralytics.utils import ops class NASPredictor(DetectionPredictor): """ Ultralytics YOLO NAS Predictor for object detection. This class extends the DetectionPredictor from Ultralytics engine and is responsible for post-processing the raw predictions generated by the YOLO NAS models. It applies operations like non-maximum suppression and scaling the bounding boxes to fit the original image dimensions. Attributes: args (Namespace): Namespace containing various configurations for post-processing including confidence threshold, IoU threshold, agnostic NMS flag, maximum detections, and class filtering options. model (torch.nn.Module): The YOLO NAS model used for inference. batch (list): Batch of inputs for processing. Examples: >>> from ultralytics import NAS >>> model = NAS("yolo_nas_s") >>> predictor = model.predictor Assume that raw_preds, img, orig_imgs are available >>> results = predictor.postprocess(raw_preds, img, orig_imgs) Notes: Typically, this class is not instantiated directly. It is used internally within the NAS class. """ def postprocess(self, preds_in, img, orig_imgs): """ Postprocess NAS model predictions to generate final detection results. This method takes raw predictions from a YOLO NAS model, converts bounding box formats, and applies post-processing operations to generate the final detection results compatible with Ultralytics result visualization and analysis tools. Args: preds_in (list): Raw predictions from the NAS model, typically containing bounding boxes and class scores. img (torch.Tensor): Input image tensor that was fed to the model, with shape (B, C, H, W). orig_imgs (list | torch.Tensor | np.ndarray): Original images before preprocessing, used for scaling coordinates back to original dimensions. Returns: (list): List of Results objects containing the processed predictions for each image in the batch. Examples: >>> predictor = NAS("yolo_nas_s").predictor >>> results = predictor.postprocess(raw_preds, img, orig_imgs) """ boxes = ops.xyxy2xywh(preds_in[0][0]) # Convert bounding boxes from xyxy to xywh format preds = torch.cat((boxes, preds_in[0][1]), -1).permute(0, 2, 1) # Concatenate boxes with class scores return super().postprocess(preds, img, orig_imgs) # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license import torch from ultralytics.models.yolo.detect import DetectionValidator from ultralytics.utils import ops __all__ = ["NASValidator"] class NASValidator(DetectionValidator): """ Ultralytics YOLO NAS Validator for object detection. Extends DetectionValidator from the Ultralytics models package and is designed to post-process the raw predictions generated by YOLO NAS models. It performs non-maximum suppression to remove overlapping and low-confidence boxes, ultimately producing the final detections. Attributes: args (Namespace): Namespace containing various configurations for post-processing, such as confidence and IoU thresholds. lb (torch.Tensor): Optional tensor for multilabel NMS. Examples: >>> from ultralytics import NAS >>> model = NAS("yolo_nas_s") >>> validator = model.validator >>> # Assumes that raw_preds are available >>> final_preds = validator.postprocess(raw_preds) Notes: This class is generally not instantiated directly but is used internally within the NAS class. """ def postprocess(self, preds_in): """Apply Non-maximum suppression to prediction outputs.""" boxes = ops.xyxy2xywh(preds_in[0][0]) # Convert bounding box format from xyxy to xywh preds = torch.cat((boxes, preds_in[0][1]), -1).permute(0, 2, 1) # Concatenate boxes with scores and permute return super().postprocess(preds)

最新发布

09-27

boxes = ops.xyxy2xywh(preds_in[0][0]) ``` 2. 拼接框与得分，并调整维度顺序： ```python preds = torch.cat((boxes, preds_in[0][1]), -1).permute(0, 2, 1) ``` 输出形状变为 `(B, N, 84)`，符合 ...

# Ultralytics YOLO 🚀, AGPL-3.0 license import json import random from collections import defaultdict from itertools import repeat from multiprocessing.pool import ThreadPool from pathlib import Path import cv2 import numpy as np import torch from PIL import Image from torch.utils.data import ConcatDataset from ultralytics.utils import LOCAL_RANK, NUM_THREADS, TQDM, colorstr from ultralytics.utils.ops import resample_segments from ultralytics.utils.torch_utils import TORCHVISION_0_18 from .augment import ( Compose, Format, LetterBox, RandomLoadText, classify_augmentations, classify_transforms, v8_transforms, ) from .base import BaseDataset from .utils import ( HELP_URL, LOGGER, get_hash, img2label_paths, load_dataset_cache_file, save_dataset_cache_file, verify_image, verify_image_label, ) # Ultralytics dataset *.cache version, >= 1.0.0 for YOLOv8 DATASET_CACHE_VERSION = "1.0.3" # 修复点1: 添加完整的Instances类定义（包含convert_bbox和denormalize方法） class Instances: """Instances class for handling bounding boxes, segments, and keypoints in object detection.""" def __init__(self, bboxes, segments=None, keypoints=None, bbox_format="xywh", normalized=True): """ Initialize Instances. Args: bboxes (np.ndarray): Bounding boxes array segments (np.ndarray, optional): Segmentation masks keypoints (np.ndarray, optional): Keypoints bbox_format (str): Bounding box format ('xywh', 'xyxy', etc.) normalized (bool): Whether coordinates are normalized """ self.bboxes = bboxes self.segments = segments self.keypoints = keypoints self.bbox_format = bbox_format self.normalized = normalized self.cls = None # 添加cls属性占位 def __len__(self): """Return the number of instances.""" return len(self.bboxes) @classmethod def empty(cls): """Return an empty Instances object.""" return cls(np.zeros((0, 4), dtype=np.float32)) @classmethod def cat(cls, instances_list): """Concatenate multiple Instances objects into one.""" bboxes = np.concatenate([inst.bboxes for inst in instances_list], axis=0) segments = np.concatenate([inst.segments for inst in instances_list], axis=0) if instances_list[0].segments is not None else None keypoints = np.concatenate([inst.keypoints for inst in instances_list], axis=0) if instances_list[0].keypoints is not None else None return cls(bboxes, segments, keypoints, bbox_format=instances_list[0].bbox_format, normalized=instances_list[0].normalized) def convert_bbox(self, format): """Convert bounding box format. Args: format (str): Target format, either 'xyxy' or 'xywh'. """ if self.bbox_format == format: return if self.bbox_format == "xywh" and format == "xyxy": # Convert from xywh to xyxy x, y, w, h = self.bboxes.T xyxy = np.array([x - w/2, y - h/2, x + w/2, y + h/2]).T self.bboxes = xyxy self.bbox_format = "xyxy" elif self.bbox_format == "xyxy" and format == "xywh": # Convert from xyxy to xywh x1, y1, x2, y2 = self.bboxes.T xywh = np.array([(x1+x2)/2, (y1+y2)/2, x2-x1, y2-y1]).T self.bboxes = xywh self.bbox_format = "xywh" else: raise ValueError(f"Conversion from {self.bbox_format} to {format} not supported") # 添加缺失的denormalize方法 def denormalize(self, w, h): """ Denormalize bounding boxes from normalized coordinates to pixel coordinates. Args: w (int): Image width h (int): Image height """ if not self.normalized: return if self.bboxes is not None and len(self.bboxes) > 0: if self.bbox_format == "xywh": # Denormalize xywh format self.bboxes[:, 0] *= w self.bboxes[:, 1] *= h self.bboxes[:, 2] *= w self.bboxes[:, 3] *= h elif self.bbox_format == "xyxy": # Denormalize xyxy format self.bboxes[:, [0, 2]] *= w self.bboxes[:, [1, 3]] *= h # 处理segments（如果存在） if self.segments is not None and len(self.segments) > 0: # segments shape: (n, num_points, 2) self.segments[..., 0] *= w self.segments[..., 1] *= h # 处理keypoints（如果存在） if self.keypoints is not None and len(self.keypoints) > 0: # keypoints shape: (n, num_keypoints, 2 or 3) self.keypoints[..., 0] *= w self.keypoints[..., 1] *= h self.normalized = False class Mosaic: """Mosaic data augmentation for object detection datasets. This class combines 4 images into a single mosaic image, adjusting labels accordingly. """ def __init__(self, dataset, imgsz=640, p=0.5, border=[-320, -320]): """ Initialize Mosaic augmentation. Args: dataset (YOLODataset): The dataset object imgsz (int): Output image size (height and width) p (float): Probability of applying mosaic augmentation border (list): Border values for random center placement """ self.dataset = dataset self.imgsz = imgsz self.p = p self.border = border self.mosaic_border = [-imgsz // 2, -imgsz // 2] def __call__(self, data): """Apply mosaic augmentation to a batch of data.""" # Only apply mosaic with given probability if random.random() > self.p: return data # Check if data contains necessary components if 'img' not in data or 'instances' not in data: return data # Get current image and instances img = data['img'] instances = data['instances'] h0, w0 = img.shape[:2] # original height and width # Create mosaic image mosaic_img = np.full((self.imgsz * 2, self.imgsz * 2, img.shape[2]), 114, dtype=np.uint8) # Random center placement yc, xc = [int(random.uniform(-x, 2 * self.imgsz + x)) for x in self.mosaic_border] # Get 3 additional random indices indices = [random.randint(0, len(self.dataset) - 1) for _ in range(3)] mosaic_instances = [] # Place 4 images in mosaic for i, index in enumerate([0] + indices): if i == 0: # current image img_i, instances_i = img, instances else: # Get other image and instances from dataset data_i = self.dataset[index] img_i = data_i['img'] instances_i = data_i['instances'] # Resize image r = self.imgsz / max(img_i.shape[:2]) img_i = cv2.resize(img_i, (int(w0 * r), int(h0 * r)), interpolation=cv2.INTER_LINEAR) h, w = img_i.shape[:2] # Place image in mosaic if i == 0: # top left x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h elif i == 1: # top right x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, self.imgsz * 2), yc x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h elif i == 2: # bottom left x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(self.imgsz * 2, yc + h) x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(h, y2a - y1a) elif i == 3: # bottom right x1a, y1a, x2a, y2a = xc, yc, min(xc + w, self.imgsz * 2), min(self.imgsz * 2, yc + h) x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(h, y2a - y1a) # Place image segment in mosaic mosaic_img[y1a:y2a, x1a:x2a] = img_i[y1b:y2b, x1b:x2b] padw, padh = x1a - x1b, y1a - y1b # Adjust instances if they exist if instances_i is not None and len(instances_i) > 0: # 确保使用xyxy格式进行处理 if instances_i.bbox_format != "xyxy": instances_i.convert_bbox("xyxy") # 修复点2: 使用copy()代替clone()处理NumPy数组 bboxes_copy = instances_i.bboxes.copy() segments_copy = instances_i.segments.copy() if instances_i.segments is not None else None keypoints_copy = instances_i.keypoints.copy() if instances_i.keypoints is not None else None # Create a copy of instances to avoid modifying original new_instances = Instances( bboxes_copy, segments_copy, keypoints_copy, bbox_format=instances_i.bbox_format, normalized=instances_i.normalized ) # Adjust bboxes if new_instances.bboxes is not None and len(new_instances.bboxes) > 0: bboxes = new_instances.bboxes if new_instances.normalized: # Convert normalized coordinates to pixels bboxes[:, [0, 2]] = bboxes[:, [0, 2]] * w bboxes[:, [1, 3]] = bboxes[:, [1, 3]] * h # Adjust coordinates bboxes[:, [0, 2]] = bboxes[:, [0, 2]] * r + padw bboxes[:, [1, 3]] = bboxes[:, [1, 3]] * r + padh # Convert back to normalized coordinates bboxes[:, [0, 2]] = bboxes[:, [0, 2]] / (self.imgsz * 2) bboxes[:, [1, 3]] = bboxes[:, [1, 3]] / (self.imgsz * 2) # Filter boxes that are completely outside the mosaic valid = ( (bboxes[:, 0] < 1) & (bboxes[:, 1] < 1) & (bboxes[:, 2] > 0) & (bboxes[:, 3] > 0)) new_instances.bboxes = bboxes[valid] # Adjust class labels if present if new_instances.cls is not None: new_instances.cls = new_instances.cls[valid] # Add adjusted instances to mosaic mosaic_instances.append(new_instances) # Combine all instances if mosaic_instances: updated_instances = Instances.cat(mosaic_instances) else: updated_instances = Instances.empty() # Update data dictionary data['img'] = mosaic_img data['instances'] = updated_instances data['mosaic_border'] = self.mosaic_border return data class YOLODataset(BaseDataset): """ Dataset class for loading object detection and/or segmentation labels in YOLO format. Args: data (dict, optional): A dataset YAML dictionary. Defaults to None. task (str): An explicit arg to point current task, Defaults to 'detect'. Returns: (torch.utils.data.Dataset): A PyTorch dataset object that can be used for training an object detection model. """ def __init__(self, *args, data=None, task="detect", **kwargs): """Initializes the YOLODataset with optional configurations for segments and keypoints.""" self.use_segments = task == "segment" self.use_keypoints = task == "pose" self.use_obb = task == "obb" self.data = data self.mosaic_enabled = False # Will be enabled in build_transforms if conditions met assert not (self.use_segments and self.use_keypoints), "Can not use both segments and keypoints." super().__init__(*args, **kwargs) def cache_labels(self, path=Path("./labels.cache")): """ Cache dataset labels, check images and read shapes. Args: path (Path): Path where to save the cache file. Default is Path('./labels.cache'). Returns: (dict): labels. """ x = {"labels": []} nm, nf, ne, nc, msgs = 0, 0, 0, 0, [] # number missing, found, empty, corrupt, messages desc = f"{self.prefix}Scanning {path.parent / path.stem}..." total = len(self.im_files) nkpt, ndim = self.data.get("kpt_shape", (0, 0)) if self.use_keypoints and (nkpt <= 0 or ndim not in {2, 3}): raise ValueError( "'kpt_shape' in data.yaml missing or incorrect. Should be a list with [number of " "keypoints, number of dims (2 for x,y or 3 for x,y,visible)], i.e. 'kpt_shape: [17, 3]'" ) with ThreadPool(NUM_THREADS) as pool: results = pool.imap( func=verify_image_label, iterable=zip( self.im_files, self.label_files, repeat(self.prefix), repeat(self.use_keypoints), repeat(len(self.data["names"])), repeat(nkpt), repeat(ndim), ), ) pbar = TQDM(results, desc=desc, total=total) for im_file, lb, shape, segments, keypoint, nm_f, nf_f, ne_f, nc_f, msg in pbar: nm += nm_f nf += nf_f ne += ne_f nc += nc_f if im_file: x["labels"].append( { "im_file": im_file, "shape": shape, "cls": lb[:, 0:1], # n, 1 "bboxes": lb[:, 1:], # n, 4 "segments": segments, "keypoints": keypoint, "normalized": True, "bbox_format": "xywh", } ) if msg: msgs.append(msg) pbar.desc = f"{desc} {nf} images, {nm + ne} backgrounds, {nc} corrupt" pbar.close() if msgs: LOGGER.info("\n".join(msgs)) if nf == 0: LOGGER.warning(f"{self.prefix}WARNING ⚠️ No labels found in {path}. {HELP_URL}") x["hash"] = get_hash(self.label_files + self.im_files) x["results"] = nf, nm, ne, nc, len(self.im_files) x["msgs"] = msgs # warnings save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION) return x def get_labels(self): """Returns dictionary of labels for YOLO training.""" self.label_files = img2label_paths(self.im_files) cache_path = Path(self.label_files[0]).parent.with_suffix(".cache") try: cache, exists = load_dataset_cache_file(cache_path), True # attempt to load a *.cache file assert cache["version"] == DATASET_CACHE_VERSION # matches current version assert cache["hash"] == get_hash(self.label_files + self.im_files) # identical hash except (FileNotFoundError, AssertionError, AttributeError): cache, exists = self.cache_labels(cache_path), False # run cache ops # Display cache nf, nm, ne, nc, n = cache.pop("results") # found, missing, empty, corrupt, total if exists and LOCAL_RANK in {-1, 0}: d = f"Scanning {cache_path}... {nf} images, {nm + ne} backgrounds, {nc} corrupt" TQDM(None, desc=self.prefix + d, total=n, initial=n) # display results if cache["msgs"]: LOGGER.info("\n".join(cache["msgs"])) # display warnings # Read cache [cache.pop(k) for k in ("hash", "version", "msgs")] # remove items labels = cache["labels"] if not labels: LOGGER.warning(f"WARNING ⚠️ No images found in {cache_path}, training may not work correctly. {HELP_URL}") self.im_files = [lb["im_file"] for lb in labels] # update im_files # Check if the dataset is all boxes or all segments lengths = ((len(lb["cls"]), len(lb["bboxes"]), len(lb["segments"])) for lb in labels) len_cls, len_boxes, len_segments = (sum(x) for x in zip(*lengths)) if len_segments and len_boxes != len_segments: LOGGER.warning( f"WARNING ⚠️ Box and segment counts should be equal, but got len(segments) = {len_segments}, " f"len(boxes) = {len_boxes}. To resolve this only boxes will be used and all segments will be removed. " "To avoid this please supply either a detect or segment dataset, not a detect-segment mixed dataset." ) for lb in labels: lb["segments"] = [] if len_cls == 0: LOGGER.warning(f"WARNING ⚠️ No labels found in {cache_path}, training may not work correctly. {HELP_URL}") return labels def build_transforms(self, hyp=None): """Builds and appends transforms to the list.""" if self.augment: # Enable mosaic if specified in hyperparameters self.mosaic_enabled = hyp.mosaic > 0 if self.augment and not self.rect else False hyp.mosaic = hyp.mosaic if self.augment and not self.rect else 0.0 hyp.mixup = hyp.mixup if self.augment and not self.rect else 0.0 # Create transforms list transforms = [] # Add Mosaic transform if enabled if self.mosaic_enabled: transforms.append(Mosaic(self, self.imgsz, p=hyp.mosaic)) # Add other standard transforms transforms.extend(v8_transforms(self, self.imgsz, hyp)) else: transforms = [Compose(LetterBox(new_shape=(self.imgsz, self.imgsz), scaleup=False))] # Add format transform transforms.append( Format( bbox_format="xywh", normalize=True, return_mask=self.use_segments, return_keypoint=self.use_keypoints, return_obb=self.use_obb, batch_idx=True, mask_ratio=hyp.mask_ratio, mask_overlap=hyp.overlap_mask, bgr=hyp.bgr if self.augment else 0.0, # only affect training. ) ) # Return as Compose object return Compose(transforms) def close_mosaic(self, hyp): """Sets mosaic, copy_paste and mixup options to 0.0 and builds transformations.""" hyp.mosaic = 0.0 # set mosaic ratio=0.0 hyp.copy_paste = 0.0 # keep the same behavior as previous v8 close-mosaic hyp.mixup = 0.0 # keep the same behavior as previous v8 close-mosaic self.transforms = self.build_transforms(hyp) def update_labels_info(self, label): """ Custom your label format here. Note: cls is not with bboxes now, classification and semantic segmentation need an independent cls label Can also support classification and semantic segmentation by adding or removing dict keys there. """ bboxes = label.pop("bboxes") segments = label.pop("segments", []) keypoints = label.pop("keypoints", None) bbox_format = label.pop("bbox_format") normalized = label.pop("normalized") # NOTE: do NOT resample oriented boxes segment_resamples = 100 if self.use_obb else 1000 if len(segments) > 0: # list[np.array(1000, 2)] * num_samples # (N, 1000, 2) segments = np.stack(resample_segments(segments, n=segment_resamples), axis=0) else: segments = np.zeros((0, segment_resamples, 2), dtype=np.float32) label["instances"] = Instances(bboxes, segments, keypoints, bbox_format=bbox_format, normalized=normalized) return label @staticmethod def collate_fn(batch): """Collates data samples into batches.""" new_batch = {} keys = batch[0].keys() values = list(zip(*[list(b.values()) for b in batch])) for i, k in enumerate(keys): value = values[i] if k == "img": value = torch.stack(value, 0) if k in {"masks", "keypoints", "bboxes", "cls", "segments", "obb"}: value = torch.cat(value, 0) new_batch[k] = value new_batch["batch_idx"] = list(new_batch["batch_idx"]) for i in range(len(new_batch["batch_idx"])): new_batch["batch_idx"][i] += i # add target image index for build_targets() new_batch["batch_idx"] = torch.cat(new_batch["batch_idx"], 0) return new_batch class YOLOMultiModalDataset(YOLODataset): """ Dataset class for loading object detection and/or segmentation labels in YOLO format. Args: data (dict, optional): A dataset YAML dictionary. Defaults to None. task (str): An explicit arg to point current task, Defaults to 'detect'. Returns: (torch.utils.data.Dataset): A PyTorch dataset object that can be used for training an object detection model. """ def __init__(self, *args, data=None, task="detect", **kwargs): """Initializes a dataset object for object detection tasks with optional specifications.""" super().__init__(*args, data=data, task=task, **kwargs) def update_labels_info(self, label): """Add texts information for multi-modal model training.""" labels = super().update_labels_info(label) # NOTE: some categories are concatenated with its synonyms by `/`. labels["texts"] = [v.split("/") for _, v in self.data["names"].items()] return labels def build_transforms(self, hyp=None): """Enhances data transformations with optional text augmentation for multi-modal training.""" transforms = super().build_transforms(hyp) if self.augment: # NOTE: hard-coded the args for now. transforms.insert(-1, RandomLoadText(max_samples=min(self.data["nc"], 80), padding=True)) return transforms class GroundingDataset(YOLODataset): """Handles object detection tasks by loading annotations from a specified JSON file, supporting YOLO format.""" def __init__(self, *args, task="detect", json_file, **kwargs): """Initializes a GroundingDataset for object detection, loading annotations from a specified JSON file.""" assert task == "detect", "`GroundingDataset` only support `detect` task for now!" self.json_file = json_file super().__init__(*args, task=task, data={}, **kwargs) def get_img_files(self, img_path): """The image files would be read in `get_labels` function, return empty list here.""" return [] def get_labels(self): """Loads annotations from a JSON file, filters, and normalizes bounding boxes for each image.""" labels = [] LOGGER.info("Loading annotation file...") with open(self.json_file) as f: annotations = json.load(f) images = {f'{x["id"]:d}': x for x in annotations["images"]} img_to_anns = defaultdict(list) for ann in annotations["annotations"]: img_to_anns[ann["image_id"]].append(ann) for img_id, anns in TQDM(img_to_anns.items(), desc=f"Reading annotations {self.json_file}"): img = images[f"{img_id:d}"] h, w, f = img["height"], img["width"], img["file_name"] im_file = Path(self.img_path) / f if not im_file.exists(): continue self.im_files.append(str(im_file)) bboxes = [] cat2id = {} texts = [] for ann in anns: if ann["iscrowd"]: continue box = np.array(ann["bbox"], dtype=np.float32) box[:2] += box[2:] / 2 box[[0, 2]] /= float(w) box[[1, 3]] /= float(h) if box[2] <= 0 or box[3] <= 0: continue cat_name = " ".join([img["caption"][t[0] : t[1]] for t in ann["tokens_positive"]]) if cat_name not in cat2id: cat2id[cat_name] = len(cat2id) texts.append([cat_name]) cls = cat2id[cat_name] # class box = [cls] + box.tolist() if box not in bboxes: bboxes.append(box) lb = np.array(bboxes, dtype=np.float32) if len(bboxes) else np.zeros((0, 5), dtype=np.float32) labels.append( { "im_file": im_file, "shape": (h, w), "cls": lb[:, 0:1], # n, 1 "bboxes": lb[:, 1:], # n, 4 "normalized": True, "bbox_format": "xywh", "texts": texts, } ) return labels def build_transforms(self, hyp=None): """Configures augmentations for training with optional text loading; `hyp` adjusts augmentation intensity.""" transforms = super().build_transforms(hyp) if self.augment: # NOTE: hard-coded the args for now. transforms.insert(-1, RandomLoadText(max_samples=80, padding=True)) return transforms class YOLOConcatDataset(ConcatDataset): """ Dataset as a concatenation of multiple datasets. This class is useful to assemble different existing datasets. """ @staticmethod def collate_fn(batch): """Collates data samples into batches.""" return YOLODataset.collate_fn(batch) # TODO: support semantic segmentation class SemanticDataset(BaseDataset): """ Semantic Segmentation Dataset. This class is responsible for handling datasets used for semantic segmentation tasks. It inherits functionalities from the BaseDataset class. Note: This class is currently a placeholder and needs to be populated with methods and attributes for supporting semantic segmentation tasks. """ def __init__(self): """Initialize a SemanticDataset object.""" super().__init__() class ClassificationDataset: """ Extends torchvision ImageFolder to support YOLO classification tasks, offering functionalities like image augmentation, caching, and verification. It's designed to efficiently handle large datasets for training deep learning models, with optional image transformations and caching mechanisms to speed up training. This class allows for augmentations using both torchvision and Albumentations libraries, and supports caching images in RAM or on disk to reduce IO overhead during training. Additionally, it implements a robust verification process to ensure data integrity and consistency. Attributes: cache_ram (bool): Indicates if caching in RAM is enabled. cache_disk (bool): Indicates if caching on disk is enabled. samples (list): A list of tuples, each containing the path to an image, its class index, path to its .npy cache file (if caching on disk), and optionally the loaded image array (if caching in RAM). torch_transforms (callable): PyTorch transforms to be applied to the images. """ def __init__(self, root, args, augment=False, prefix=""): """ Initialize YOLO object with root, image size, augmentations, and cache settings. Args: root (str): Path to the dataset directory where images are stored in a class-specific folder structure. args (Namespace): Configuration containing dataset-related settings such as image size, augmentation parameters, and cache settings. It includes attributes like `imgsz` (image size), `fraction` (fraction of data to use), `scale`, `fliplr`, `flipud`, `cache` (disk or RAM caching for faster training), `auto_augment`, `hsv_h`, `hsv_s`, `hsv_v`, and `crop_fraction`. augment (bool, optional): Whether to apply augmentations to the dataset. Default is False. prefix (str, optional): Prefix for logging and cache filenames, aiding in dataset identification and debugging. Default is an empty string. """ import torchvision # scope for faster 'import ultralytics' # Base class assigned as attribute rather than used as base class to allow for scoping slow torchvision import if TORCHVISION_0_18: # 'allow_empty' argument first introduced in torchvision 0.18 self.base = torchvision.datasets.ImageFolder(root=root, allow_empty=True) else: self.base = torchvision.datasets.ImageFolder(root=root) self.samples = self.base.samples self.root = self.base.root # Initialize attributes if augment and args.fraction < 1.0: # reduce training fraction self.samples = self.samples[: round(len(self.samples) * args.fraction)] self.prefix = colorstr(f"{prefix}: ") if prefix else "" self.cache_ram = args.cache is True or str(args.cache).lower() == "ram" # cache images into RAM if self.cache_ram: LOGGER.warning( "WARNING ⚠️ Classification `cache_ram` training has known memory leak in " "https://github.com/ultralytics/ultralytics/issues/9824, setting `cache_ram=False`." ) self.cache_ram = False self.cache_disk = str(args.cache).lower() == "disk" # cache images on hard drive as uncompressed *.npy files self.samples = self.verify_images() # filter out bad images self.samples = [list(x) + [Path(x[0]).with_suffix(".npy"), None] for x in self.samples] # file, index, npy, im scale = (1.0 - args.scale, 1.0) # (0.08, 1.0) self.torch_transforms = ( classify_augmentations( size=args.imgsz, scale=scale, hflip=args.fliplr, vflip=args.flipud, erasing=args.erasing, auto_augment=args.auto_augment, hsv_h=args.hsv_h, hsv_s=args.hsv_s, hsv_v=args.hsv_v, ) if augment else classify_transforms(size=args.imgsz, crop_fraction=args.crop_fraction) ) def __getitem__(self, i): """Returns subset of data and targets corresponding to given indices.""" f, j, fn, im = self.samples[i] # filename, index, filename.with_suffix('.npy'), image if self.cache_ram: if im is None: # Warning: two separate if statements required here, do not combine this with previous line im = self.samples[i][3] = cv2.imread(f) elif self.cache_disk: if not fn.exists(): # load npy np.save(fn.as_posix(), cv2.imread(f), allow_pickle=False) im = np.load(fn) else: # read image im = cv2.imread(f) # BGR # Convert NumPy array to PIL image im = Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB)) sample = self.torch_transforms(im) return {"img": sample, "cls": j} def __len__(self) -> int: """Return the total number of samples in the dataset.""" return len(self.samples) def verify_images(self): """Verify all images in dataset.""" desc = f"{self.prefix}Scanning {self.root}..." path = Path(self.root).with_suffix(".cache") # *.cache file path try: cache = load_dataset_cache_file(path) # attempt to load a *.cache file assert cache["version"] == DATASET_CACHE_VERSION # matches current version assert cache["hash"] == get_hash([x[0] for x in self.samples]) # identical hash nf, nc, n, samples = cache.pop("results") # found, missing, empty, corrupt, total if LOCAL_RANK in {-1, 0}: d = f"{desc} {nf} images, {nc} corrupt" TQDM(None, desc=d, total=n, initial=n) if cache["msgs"]: LOGGER.info("\n".join(cache["msgs"])) # display warnings return samples except (FileNotFoundError, AssertionError, AttributeError): # Run scan if *.cache retrieval failed nf, nc, msgs, samples, x = 0, 0, [], [], {} with ThreadPool(NUM_THREADS) as pool: results = pool.imap(func=verify_image, iterable=zip(self.samples, repeat(self.prefix))) pbar = TQDM(results, desc=desc, total=len(self.samples)) for sample, nf_f, nc_f, msg in pbar: if nf_f: samples.append(sample) if msg: msgs.append(msg) nf += nf_f nc += nc_f pbar.desc = f"{desc} {nf} images, {nc} corrupt" pbar.close() if msgs: LOGGER.info("\n".join(msgs)) x["hash"] = get_hash([x[0] for x in self.samples]) x["results"] = nf, nc, len(samples), samples x["msgs"] = msgs # warnings save_dataset_cache_file(self.prefix, path, x, DATASET_CACHE_VERSION) return samples（这是dataset.py代码）

09-25

**步骤2：修改数据加载逻辑（针对索引越界）** 在`ultralytics/data/base.py`中，`BaseDataset`类的`__getitem__`方法负责获取单个样本。我们可以在这里添加索引检查，但更关键的是在构建数据集时确保所有索引都是...

Python学习笔记（2）assert、__all__的作用

南风的博客

03-19

870

一、__all__的作用 """ __init__.py """ __all__ = ['AClass','bmethod','cvariable'] def bmethod(): pass 设置暴露的白名单，在使用from XXX import * 时只导入__all__设置的成员 ...

linux___assert()___函数操作

zz460833359的博客

07-17

2223

这个assert函数一直在心里有个结，没有好好的理清楚，今天有时间，看明白一些，做下笔记，如下。（注，例子是从网上找的，自己加以修改后测试）/*说明： assert 其值为假（即为0），那么它先向stderr打印一条出错信息，然后通过调用 abort 来终止程序运行。如果加了-DNDEBUG 等于去掉了assert函数。 */ #include #include #include

DETR类模型训练报错assert (boxes1[:, 2:] ＞= boxes1[:, :2]).all()

qq_20793791的博客

10-18

1367

关闭混合精度训练，即在训练中让amp=False，为了让自己的batch size大一些，魔改了作者的代码，结果最后是这里出了问题，估计是FP16精度不够溢出了。：一个另外的问题时当时想在模型报错的时候打印box的结果，但是终端什么东西都没有，可能是多进程的原因，解决方案为报错时写入log文件而不是print()：在多卡模型训练时碰到了这个问题，github给出的解决方案五花八门，有降低学习率的，有人num_classes写错了的，但是都不行。：代码断言错误，模型预测结果的box输出为NaN。

运行Faster-RCNN-TensorFlow-Python3.5的train.py文件出现 assert (boxes[:, 2] >= boxes[:, 0]).all()错误

kelly若的博客

06-25

802

参考这篇文章：https://blog.csdn.net/10km/article/details/64641322 我觉得我的xml文件不需要将矩形框都减1，所以在-1前面都加一个井号，将-1都注释掉。错误：File "/home/drl/new/Faster-RCNN-TensorFlow-Python3.5/lib/datasets/imdb.py", line 119, in appen...

训练Faster R-CNN出现的errors

RZJMPB的博客

09-02

1254

1. assert (boxes[:, 2] >= boxes[:, 0]).all() AssertionError目前临时解决的方法：在 imdb.py中：for b in range(len(boxes)): if boxes[b][2] < boxes[b][0]: boxes[b][0] = 0asser

Faster RCNN 训练自己的检测模型

marshb11的专栏

01-03

1万+

一、准备自己的训练数据根据pascal VOC 2007的训练数据集基本架构，第一步，当然是要准备自己的训练图片集，本文直接将自己的准备的图片集（.jpg）扔到如下文件夹下： $(py-faster-rcnn)/data/VOCdevkit2007/VOC2007/JPEGImages第二步，根据上述自己的要训练检测的物体图片集，标注相应的.xml文件(我是自己写了一个简单的矩形框标注工具，生成相应