使用faster rcnn进行目标检测(pytorch)

参考Faster R-CNN Object Detection with PyTorch | LearnOpenCV

在R-CNN中,每个边界框由图像分类器独立分类,有2000个区域提案,图像分类器计算了每个区域提案的特征图。这个过程很耗时。 在Ross Girshick的后续工作中,他提出了一种称为快速R-CNN的方法,它显著地加快了目标检测的速度。 其想法是为整个图像计算一个单一的特征图,而不是为2000个区域提案计算2000个特征图。对于每个区域提案,感兴趣区域(ROI)池化层从特征图中提取固定长度的特征向量。然后,每个特征向量被用于两个目的

1、将区域分类为其中一个类(例如。狗,猫,背景)。

2、使用边界框回归器提高原始边界框的精度。

 faster R-CNN目标检测器

在 fast R-CNN中,即使对2000个区域提案进行分类的计算是共享的,但生成区域提案的算法部分与执行图像分类的部分不共享任何计算。 在被称为FasterR-CNN的后续工作中,主要的见解是这两个部分-计算区域提案和图像分类-可以使用相同的特征图,从而分担计算负荷。 利用卷积神经网络生成图像特征图,同时用于训练区域提案网络和图像分类器。由于这种共享计算,对象检测的速度有了显著的提高。
 

下载文件: wget https://cdn.pixabay.com/photo/2013/07/05/01/08/traffic-143391_960_720.jpg -O traffic_scene.jpg

wget https://images.unsplash.com/photo-1458169495136-854e4c39548a -O girl_cars.jpg

完整代码

# coding: UTF-8
# language: python
# import pagage
from PIL import Image
import matplotlib.pyplot as plt
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
import cv2


# get the pretrained model from torchvision.models
# note pretrained=True will get the pretraied weights for the model
# model.eval() to use the model for interence
model=torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
# model.cuda() to use the model on the GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# 定义一个函数来获得图像路径,并通过模型达到图像的预测

def get_prediction(img_path, threshold):
  img = Image.open(img_path) # Load the image
  transform = transforms.Compose([transforms.ToTensor()]) # Defing PyTorch Transform
  img = transform(img) # Apply the transform to the image
  pred = model([img]) # Pass the image to the model
  print('pred')
  print(pred)
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())] # Get the Prediction Score
  print("original pred_class")
  print(pred_class)
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())] # Bounding boxes
  print("original pred_boxes")
  print(pred_boxes)
  pred_score = list(pred[0]['scores'].detach().numpy())
  print("orignal score")
  print(pred_score)
  pred_t = [pred_score.index(x) for x in pred_score if x > threshold][-1] # Get list of index with score greater than threshold.
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  print(pred_t)
  print(pred_boxes)
  print(pred_class)
  return pred_boxes, pred_class


# define a fun to get img_path and output img
def object_detection_api(img_path, threshold=0.5, rect_th=3, text_size=3, text_th=3):
 
  boxes, pred_cls = get_prediction(img_path, threshold) # Get predictions
  img = cv2.imread(img_path) # Read image with cv2
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert to RGB
  for i in range(len(boxes)):
    cv2.rectangle(img, (int(boxes[i][0][0]),int(boxes[i][0][1])), (int(boxes[i][1][0]),int(boxes[i][1][1])),color=(0, 255, 0), thickness=rect_th)
    cv2.putText(img,pred_cls[i], (int(boxes[i][0][0]),int(boxes[i][0][1])), cv2.FONT_HERSHEY_SIMPLEX, text_size, (0,255,0),thickness=text_th)
    # cv2.rectangle(img, boxes[i][0], boxes[i][1],color=(0, 255, 0), thickness=rect_th) # Draw Rectangle with the coordinates
    # cv2.putText(img,pred_cls[i], boxes[i][0],  cv2.FONT_HERSHEY_SIMPLEX, text_size, (0,255,0),thickness=text_th) # Write the prediction class
  plt.figure(figsize=(20,30)) # display the output image
  plt.imshow(img)
  plt.xticks([])
  plt.yticks([])
  plt.show()


# object_detection_api('./people.jpg',threshold=0.8)
# object_detection_api('./car.jpg', rect_th=6, text_th=5, text_size=5)
object_detection_api("./traffic_scene.jpg", rect_th=2, text_th=1, text_size=1, threshold=0.5)

# compare cpu and GPU time
import time
 
def check_inference_time(image_path, gpu=False):
  model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
  model.eval()
  img = Image.open(image_path)
  transform = transforms.Compose([transforms.ToTensor()])
  img = transform(img)
  if gpu:
    model.cuda()
    img = img.cuda()
  else:
    model.cpu()
    img = img.cpu()
  start_time = time.time()
  pred = model([img])
  end_time = time.time()
  return end_time-start_time
 
cpu_time = sum([check_inference_time('./girl_cars.jpg', gpu=False) for _ in range(10)])/10.0
gpu_time = sum([check_inference_time('./girl_cars.jpg', gpu=True) for _ in range(10)])/10.0
 
 
print('\n\nAverage Time take by the model with GPU = {}s\nAverage Time take by the model with CPU = {}s'.format(gpu_time, cpu_time))

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个简单的class-agnostic模块的PyTorch示例代码,该模块使用卷积层进行特征提取和mask预测: ```python import torch import torch.nn as nn class ClassAgnosticModule(nn.Module): def __init__(self, input_channels, num_classes): super(ClassAgnosticModule, self).__init__() self.conv1 = nn.Conv2d(input_channels, 256, kernel_size=3, stride=1, padding=1) # 卷积层1 self.relu = nn.ReLU() # 激活函数 self.conv2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) # 卷积层2 self.conv3 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) # 卷积层3 self.conv4 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) # 卷积层4 self.conv5 = nn.Conv2d(256, num_classes, kernel_size=1, stride=1) # 卷积层5 def forward(self, x): out = self.conv1(x) out = self.relu(out) out = self.conv2(out) out = self.relu(out) out = self.conv3(out) out = self.relu(out) out = self.conv4(out) out = self.relu(out) out = self.conv5(out) return out ``` 注释: - `nn.Module`是PyTorch中所有神经网络模块的基类。 - `__init__`方法用于定义模型的结构,接受输入通道数和类别数作为参数,并初始化模型的卷积层。在这个例子中,我们定义了五个卷积层。 - `forward`方法定义了模型的前向传播过程,接受输入张量`x`,并将其传递给卷积层进行特征提取和mask预测。在这个例子中,我们使用ReLU作为激活函数。 - `Conv2d`是PyTorch中的二维卷积层实现,接受输入通道数、输出通道数、卷积核大小、步长和填充大小作为参数,并自动进行权重初始化。在这个例子中,我们使用了多个卷积层来提取特征和进行mask预测。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值