FCOS代码(一) (demo过程)骨干网络结构详解,mask-rcnn ResNet+fpn
FCOS代码(二)(demo过程) RPN网络结构
首先是fcos_demo.py,这里最主要的步骤是下面这个,
coco_demo = COCODemo( cfg, confidence_thresholds_for_classes=thresholds_for_classes, min_image_size=args.min_image_size )
它里面包含了整个demo的过程,也代表了FCOS这个方法的推理过程,即怎么去实现的该方法。
demo_im_names = os.listdir(args.images_dir) # 放置测试图片的路径
# prepare object that handles inference plus adds predictions on top of image
coco_demo = COCODemo(
cfg,
confidence_thresholds_for_classes=thresholds_for_classes,
min_image_size=args.min_image_size
) # all cfg from imported
i = 0
for im_name in demo_im_names:
img = cv2.imread(os.path.join(args.images_dir, im_name)) # 读入图片
if img is None:
continue
start_time = time.time()
composite = coco_demo.run_on_opencv_image(img) # 返回最终的结果
print("{}\tinference time: {:.2f}s".format(im_name, time.time() - start_time))
可以看到这里用了composite = coco_demo.run_on_opencv_image(img),从而转到preditector.py中,其定义如下所示:
1) run_on_opencv_image
def run_on_opencv_image(self, image):
"""
Arguments:
image (np.ndarray): an image as returned by OpenCV
Returns:
prediction (BoxList): the detected objects. Additional information
of the detection properties can be found in the fields of
the BoxList via `prediction.fields()`
"""
predictions = self.compute_prediction(image) # 返回预测的结果,已经反射回原图像
top_predictions = self.select_top_predictions(predictions) # 返回筛选过后满足分类得分大于阈值的预测结果
result = image.copy() # 原img
if self.show_mask_heatmaps: # False,热力图操作
return self.create_mask_montage(result, top_predictions)
result = self.overlay_boxes(result, top_predictions) # 在原图像上画预测的矩形框
if self.cfg.MODEL.MASK_ON: # False,画mask
result = self.overlay_mask(result, top_predictions)
if self.cfg.MODEL.KEYPOINT_ON: # False 关键点
result = self.overlay_keypoints(result, top_predictions)
result = self.overlay_class_names(result, top_predictions)
return result
这里有一句 predictions = self.compute_prediction(image),其定义如下所示接下来细节描述一下该函数。
(I) compute_prediction
def compute_prediction(self, original_image):
"""
Arguments:
original_image (np.ndarray): an image as returned by OpenCV
Returns:
prediction (BoxList): the detected objects. Additional information
of the detection properties can be found in the fields of
the BoxList via `prediction.fields()`
"""
# apply pre-processing to image
image = self.transforms(original_image) # 将输入图片宽resize成800,并且将像素值标准化为正态分布
# convert to an ImageList, padded so that it is divisible by
# cfg.DATALOADER.SIZE_DIVISIBILITY
image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY) # SIZE_DIVISIBILITY-->32 返回的是ImageList类,里面包含image_sizes和tensors
image_list = image_list.to(self.device) # to cuda
# compute predictions
with torch.no_grad():
predictions = self.model(image_list) # 举例 BoxList:100
predictions = [o.to(self.cpu_device) for o in predictions] # to cpu
# always single image is passed at a time
prediction = predictions[0]
# reshape prediction (a BoxList) into the original image size
height, width = original_image.shape[:-1] # 原图像的长和宽
prediction = prediction.resize((width, height)) # 返回经过resize后的预测结果,将预测的bbox坐标映射到原图像上
if prediction.has_field("mask"): # False
# if we have masks, paste the masks in the right position
# in the image, as defined by the bounding boxes
masks = prediction.get_field("mask")
# always single image is passed at a time
masks = self.masker([masks], [prediction])[0]
prediction.add_field("mask", masks)
return prediction
i. image = self.transforms(original_image)
定义如下所示
def build_transform(self):
"""
Creates a basic transformation that was used to train the models
"""
cfg = self.cfg
# we are loading images with OpenCV, so we don't need to convert them
# to BGR, they are already! So all we need to do is to normalize
# by 255 if we want to convert to BGR255 format, or flip the channels
# if we want it to be in RGB in [0-1] range.
if cfg.INPUT.TO_BGR255: # True
to_bgr_transform = T.Lambda(lambda x: x * 255) # 将x中的元素都乘255,在下面的compose中会用到
else:
to_bgr_transform = T.Lambda(lambda x: x[[2, 1, 0]])
normalize_transform = T.Normalize(
mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD
) # 标准化均值为0,方差为1的正态分布,(x-mean)/std 调整图片的3通道分布区间。cfg.INPUT.PIXEL_MEAN={list:3}[102.9801,115.9465,122.7717], cfg.INPUT.PIXEL_STD={list:3}[1.0,1.0,1.0]。
transform = T.Compose(
[
T.ToPILImage(), # 转换图像为PIL格式,为下一步铺路
T.Resize(self.min_image_size), # 将短边resize到800
T.ToTensor(), # 转到tensor的形式,并且将像素值除以255归一化到[0,1]之间
to_bgr_transform, # 将像素乘以255,
normalize_transform, # 进行标准化
]
)
return transform
------ to_bgr_transform = T.Lambda(lambda x: x * 255)
这个T是导入的工具包
from torchvision import transforms as T
T.Lambda函数(参考pytorch transforms.Lambda的使用)定义自己的策略然后封装它(封装的步骤在下面的transform = T.Compose步骤中),这里定义的作用是使x中的每个元素乘以255。
------T.Normalize(mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD)
T.Normalize(参考PyTorch数据归一化处理:transforms.Normalize及计算图像数据集的均值和方差)对图像进行标准化(均值变为0,标准差变为1,正态分布), 公式 (x-mean)/std 。这个均值和方差都是事先计算好的(根据参考,好像是在数据集上预先抽样计算得到的),而且这是相当于通过这个上述式子调整数据的分布,使其满足均值为0,标准差为1,这样可以加快收敛速度。
标准化和归一化参考什么是归一化,它与标准化的区别是什么?。可以更好理解。
------T.ToPILImage() 和 T.ToTensor()
参考(torchvision.transforms.ToTensor()与torchvision.transforms.ToPILImage()详解)。
根据参考,ToPILImage() 会根据输入其中图像的格式其输出会有所区别,但最终输出都是PIL格式。当输入的图像是numpy(比如cv2,imread读入时),此时图像的通道为(h,w,c),所以不需要再转换通道的顺序,并且它的像素值在[0, 255]之间的话只转换颜色通道(c中的)的数序(BGR---->RGB,||这个暂时不确定,有点迷||。参考【PyTorch】torchvision.transforms.ToPILImage 与图像分辨率);当其输入是Tensor时,图像的Tensor的通道形式为(c, h, w),而且其像素值是浮点型,若在此之前使用了Totensor将像素值除以255变为[0, 1]区间的话,ToPILImage会改变通道形式为(h,w,c),并且将像素值乘以255。
(也可以结合参考Pytorch之浅入torchvision.transforms.ToTensor与ToPILImage)
------ T.Resize
调整 PILImage对象的尺寸,不能是用io.imread或者cv2.imread读取的图片。(参考Pytorch transforms.Resize()的简单用法)所以在此之前用了ToPILImage() 方法转成对应的格式。
ii. image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY)
其代码如下所示, max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors]))这一步,对于只输入一张图片无变化,如果传入的是batchsize(训练阶段)就是统一所有图片的形状,通道数都是3,宽都是800,则只统一长度为batchsize中img最大的长度。然后再对max_size进行处理使img的尺寸都是stride步长的整数倍(math.ceil为向上取整函数,参考Math.ceil())。
def to_image_list(tensors, size_divisible=0): # 32
"""
tensors can be an ImageList, a torch.Tensor or
an iterable of Tensors. It can't be a numpy array.
When tensors is an iterable of Tensors, it pads
the Tensors with zeros so that they have the same
shape
"""
if isinstance(tensors, torch.Tensor) and size_divisible > 0: # True
tensors = [tensors] # 转换成一个列表形式
if isinstance(tensors, ImageList):
return tensors
elif isinstance(tensors, torch.Tensor):
# single tensor shape can be inferred
if tensors.dim() == 3:
tensors = tensors[None]
assert tensors.dim() == 4
image_sizes = [tensor.shape[-2:] for tensor in tensors]
return ImageList(tensors, image_sizes)
elif isinstance(tensors, (tuple, list)): # True,因为之前那步已经把tensors放进一个列表中
max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors])) # 使每个batch的img形状长一样,宽之前都做了处理为800,取它们之中最大的长并统一,当demo时是只传入一张图片,所以没有发生变化。
# TODO Ideally, just remove this and let me model handle arbitrary
# input sizs
if size_divisible > 0:
import math
stride = size_divisible # 32
max_size = list(max_size) # 举例:list:3 [3, 800, 1120]
max_size[1] = int(math.ceil(max_size[1] / stride) * stride) # 先除以步长向上取整然后再乘以步长在转整型,确保图片的尺寸是步长的整数倍,下同
max_size[2] = int(math.ceil(max_size[2] / stride) * stride)
max_size = tuple(max_size) # 转成元组, 举例 tuple:3 (3,800,1120)
batch_shape = (len(tensors),) + max_size # 将batchsize的大小加入第一维 举例 tuple:4 (1,3,800,1120)
batched_imgs = tensors[0].new(*batch_shape).zero_() # 与imgshape一样,数值全为0的tensor Tensor:(1,3,800,1120)
for img, pad_img in zip(tensors, batched_imgs):
pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
image_sizes = [im.shape[-2:] for im in tensors] # 举例 list:1 [torch.Size([800,1120])]
return ImageList(batched_imgs, image_sizes) # 返回ImageList类
else:
raise TypeError("Unsupported type for to_image_list: {}".format(type(tensors)))
iii. with torch.no_grad(): predictions = self.model(image_list)
输入图片,得道网络的输出也测结果,参考FCOS代码(一) (demo过程)骨干网络结构详解,mask-rcnn ResNet+fpn以及
FCOS代码(二)(demo过程) RPN网络结构
iv.eight, width = original_image.shape[:-1]
得到原图像的长和宽,之前的操作都是把输入的原图像进行了transform,宽resize成了800,长也进行了改变。
v. prediction = prediction.resize((width, height))
prediction 是 之前返回的 BoxList类实例结果,其中的resize函数如下所示,把预测的结果bbox的坐标反射回原图像上。
def resize(self, size, *args, **kwargs):
"""
Returns a resized copy of this bounding box
:param size: The requested size in pixels, as a 2-tuple:
(width, height).
"""
ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(size, self.size)) # 原img 长和宽 / transform后img的 长和宽 举例:ratios={tuple:2} (0.57,0.5714), size={tuple:2} (640,457), self.size={tuple:2} (1120,800)
if ratios[0] == ratios[1]: # 如果比率都相等
ratio = ratios[0] # 取其中一个即可
scaled_box = self.bbox * ratio # 预测的坐标乘以比率
bbox = BoxList(scaled_box, size, mode=self.mode) # 折射回原img的BoxList类
# bbox._copy_extra_fields(self)
for k, v in self.extra_fields.items(): # 标签和类别得分
if not isinstance(v, torch.Tensor):
v = v.resize(size, *args, **kwargs)
bbox.add_field(k, v) # 加入bbox实例的字典
return bbox
ratio_width, ratio_height = ratios # 长和宽比
xmin, ymin, xmax, ymax = self._split_into_xyxy() # 坐标
scaled_xmin = xmin * ratio_width # 乘以比率 , 下同
scaled_xmax = xmax * ratio_width
scaled_ymin = ymin * ratio_height
scaled_ymax = ymax * ratio_height
scaled_box = torch.cat(
(scaled_xmin, scaled_ymin, scaled_xmax, scaled_ymax), dim=-1
) # 连接在一起
bbox = BoxList(scaled_box, size, mode="xyxy") # 类实例
# bbox._copy_extra_fields(self)
for k, v in self.extra_fields.items():
if not isinstance(v, torch.Tensor):
v = v.resize(size, *args, **kwargs)
bbox.add_field(k, v) # 标签和类别得分 添加字典
return bbox.convert(self.mode) # 直接返回该实例
至此,compute_prediction已经结束。
(II) top_predictions = self.select_top_predictions(predictions)
代码如下所示
def select_top_predictions(self, predictions):
"""
Select only predictions which have a `score` > self.confidence_threshold,
and returns the predictions in descending order of score
Arguments:
predictions (BoxList): the result of the computation by the model.
It should contain the field `scores`.
Returns:
prediction (BoxList): the detected objects. Additional information
of the detection properties can be found in the fields of
the BoxList via `prediction.fields()`
"""
scores = predictions.get_field("scores") # 拿出分类得分
labels = predictions.get_field("labels") # 拿出对应的标签
thresholds = self.confidence_thresholds_for_classes[(labels - 1).long()] # 拿出各个标签的阈值,举例 Tensor:(100, )
keep = torch.nonzero(scores > thresholds).squeeze(1) # 得到满足大于标签类别的顺序索引,举例 tensor:(4,) =([2,20,38,99])
predictions = predictions[keep] # 最终的预测结果 举例 BoxList:4
scores = predictions.get_field("scores") # 分类得分,这个是更新了的predictions
_, idx = scores.sort(0, descending=True) # 返回由大到小的得分排列顺序,以及之前的对应索引顺序
return predictions[idx] # 返回最终的预测结果,BoxList类实例,举例 BoxList:4
i. thresholds = self.confidence_thresholds_for_classes[(labels - 1).long()]
torch.long函数能把对应的tensor值转换成index索引形式,参考pytorc torch.uint8与torch.long/ torch. float。这步就是对应预测的顺序依次拿出对应的类别标签的阈值。self.confidence_thresholds_for_classes在focs_demo.py里定义的。
(III) result = self.overlay_boxes(result, top_predictions)
其代码如下所示, 功能为在原图像上画出预测的bbox边界框。
def overlay_boxes(self, image, predictions): # 传入的是原图片,和预测的结果
"""
Adds the predicted boxes on top of the image
Arguments:
image (np.ndarray): an image as returned by OpenCV
predictions (BoxList): the result of the computation by the model.
It should contain the field `labels`.
"""
labels = predictions.get_field("labels") # 标签
boxes = predictions.bbox # 边界框
colors = self.compute_colors_for_labels(labels).tolist() # 返回不同的color数值转列表,每一个为3通道的rgb数值,举例 {list:4}=[[1,127,31],[33,111,3],[35,110,65],[40,235,220]]
for box, color in zip(boxes, colors):
box = box.to(torch.int64) # 预测的bbox转成整型
top_left, bottom_right = box[:2].tolist(), box[2:].tolist() # 左上角顶点,右下角顶点
image = cv2.rectangle(
image, tuple(top_left), tuple(bottom_right), tuple(color), 2
) # 在原图像上画矩形框
return image
i. colors = self.compute_colors_for_labels(labels).tolist()
代码定义如下,其中转numpy.uint8的格式参考np.astype uint8之后发生了什么,小数部分舍去,整数部分保留
def compute_colors_for_labels(self, labels):
"""
Simple function that adds fixed colors depending on the class
"""
colors = labels[:, None] * self.palette # 结果举例 Tensor:(4,3)种类标签类别--索引顺序,乘上eslf.palette--Tensor:(3,)--[33554431,32767,2097151]
colors = (colors % 255).numpy().astype("uint8") # 除以255取余,转numpy,uint8形式
return colors # 返回不同的颜色数值,范围为0~255
(IV) result = self.overlay_class_names(result, top_predictions)
代码如下所示,为每个矩形框标识上得分和种类名称
def overlay_class_names(self, image, predictions):
"""
Adds detected class names and scores in the positions defined by the
top-left corner of the predicted bounding box
Arguments:
image (np.ndarray): an image as returned by OpenCV
predictions (BoxList): the result of the computation by the model.
It should contain the field `scores` and `labels`.
"""
scores = predictions.get_field("scores").tolist() # 分类得分列表
labels = predictions.get_field("labels").tolist() # 标签列表
labels = [self.CATEGORIES[i] for i in labels] # 标签所代表的种类名称列表
boxes = predictions.bbox # 预测的bbox
template = "{}: {:.2f}"
for box, score, label in zip(boxes, scores, labels):
x, y = box[:2] # 左上角顶点
s = template.format(label, score) # 种类名称,得分
cv2.putText(
image, s, (int(x), int(y)
), cv2.FONT_HERSHEY_SIMPLEX, .5, (255, 255, 255), 1
) # 写上文本
return image
至此, run_on_opencv_image函数结束,返回最终的预测结果,即画了矩形框和标识了类别和得分的原图像。
至此,整个demo过程结束!