人工智能检测运动物体_您必须阅读5篇有关物体检测的AI毫升研究论文-CSDN博客

人工智能检测运动物体

DetectoRS：使用递归特征金字塔和可切换的Atrous卷积检测对象(DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution)

By Siyuan Qiao, Liang-Chieh Chen, Alan Yuille

乔思远，陈良杰，艾伦·尤尔

摘要—(Abstract —)

Many modern object detectors demonstrate outstanding performances by using the mechanism of looking and thinking twice. In this paper, we explore this mechanism in the backbone design for object detection. At the macro level, we propose a Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers. At the micro-level, we propose Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions. Combining them results in DetectoRS, which significantly improves the performances of object detection. On COCO test-dev, DetectoRS achieves state-of-the-art 54.7% box AP for object detection, 47.1% mask AP for instance segmentation, and 49.6% PQ for panoptic segmentation.

许多现代的物体检测器通过使用两次观察和思考的机制展示了出色的性能。在本文中，我们探索了用于目标检测的主干设计中的这种机制。在宏级别，我们提出了一个递归特征金字塔，该金字塔将来自特征金字塔网络的额外反馈连接合并到自下而上的骨干层中。在微观层次上，我们提出了可切换的Atrous卷积，它可以将具有不同atrous速率的特征进行卷积，并使用switch函数来收集结果。将它们组合在一起将产生DetectoRS，这将大大提高对象检测的性能。在COCO测试开发中，DetectoRS实现了54.7％的盒式AP(用于对象检测)，47.1％的遮罩AP(用于实例分割)和49.6％的PQ(用于全景分割)。

Paper can be found here :

纸可以在这里找到：

https://arxiv.org/pdf/2006.02334v1.pdf

Code can be found here :

代码可以在这里找到：

IterDet：拥挤环境中对象检测的迭代方案 (IterDet: Iterative Scheme for ObjectDetection in Crowded Environments)

By Danila Rukhovich, Konstantin Sofiiuk, Danil Galeev, Olga Barinova, Anton Konushin

丹妮拉·鲁霍维奇(Danila Rukhovich)，康斯坦丁·索菲乌克(Konstantin Sofiiuk)，丹尼尔·加列夫(Danil Galeev)，奥尔加·巴里诺娃(Olga Barinova)，安东·科努申(Anton Konushin)

摘要— (Abstract —)

Deep learning-based detectors usually produce a redundant set of object bounding boxes including many duplicate detections of the same object. These boxes are then filtered using non-maximum suppression (NMS) in order to select exactly one bounding box per object of interest. This greedy scheme is simple and provides sufficient accuracy for isolated objects but often fails in crowded environments, since one needs to both preserve boxes for different objects and suppress duplicate detections. In this work we develop an alternative iterative scheme, where a new subset of objects is detected at each iteration. Detected boxes from the previous iterations are passed to the network at the following iterations to ensure that the same object would not be detected twice. This iterative scheme can be applied to both one-stage and two-stage object detectors with just minor modifications of the training and inference procedures. We perform extensive experiments with two different baseline detectors on four datasets and show significant improvement over the baseline, leading to state-of-the-art performance on CrowdHuman and WiderPerson datasets.

基于深度学习的检测器通常会产生一组冗余的对象边界框，其中包括对同一对象的许多重复检测。然后使用非最大抑制(NMS)过滤这些框，以便为每个感兴趣的对象精确选择一个边界框。这种贪婪的方案很简单，可以为孤立的对象提供足够的精度，但是在拥挤的环境中经常失败，因为一个人既需要保存不同对象的盒子，又要抑制重复检测。在这项工作中，我们开发了一种替代的迭代方案，其中在每次迭代中都检测到一个新的对象子集。来自先前迭代的检测到的框在接下来的迭代中传递到网络，以确保不会两次检测到同一对象。只需对训练和推理过程进行较小的修改，就可以将这种迭代方案应用于一级和二级对象检测器。我们在四个数据集上使用两个不同的基线检测器进行了广泛的实验，并显示了相对于基线的显着改进，从而在CrowdHuman和WiderPerson数据集上实现了最先进的性能。

Paper can be found here :

纸可以在这里找到：

https://arxiv.org/pdf/2005.05708v1.pdf

Code can be found here :

代码可以在这里找到：

用于目标检测的单发细化神经网络 (Single-Shot Refinement Neural Network for Object Detection)

By Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li

张世峰，温龙吟，小编，甄磊，斯坦Z.李

摘要—(Abstract —)

For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to 1. filter out negative anchors to reduce search space for the classifier, and 2. coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as input from the former to further improve the regression and predict a multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes, and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency.

对于对象检测，两阶段方法(例如Faster R-CNN)已达到最高的准确性，而一阶段方法(例如SSD)具有高效的优势。为了继承两者的优点，同时克服它们的缺点，我们提出了一种新颖的基于单发的检测器，称为RefineDet，该检测器比两阶段方法具有更高的准确性，并且保持了与一阶段方法相当的效率。 RefineDet由两个相互连接的模块组成，即锚点优化模块和对象检测模块。具体而言，前者旨在1.过滤出负锚以减少分类器的搜索空间，并2.粗略调整锚的位置和大小以为后续回归器提供更好的初始化。后者模块将精炼的锚点作为前者的输入，以进一步改善回归并预测多类别标签。同时，我们设计了一个传递连接块来传递锚点细化模块中的特征，以预测对象检测模块中对象的位置，大小和类别标签。多任务丢失功能使我们能够以端到端的方式训练整个网络。在PASCAL VOC 2007，PASCAL VOC 2012和MS COCO上进行的大量实验表明，RefineDet可以高效实现最先进的检测精度。

Paper can be found here :

纸可以在这里找到：

https://arxiv.org/pdf/1711.06897v3.pdf

Code can be found here :

代码可以在这里找到：

VoxelNet：基于点云的3D对象检测的端到端学习 (VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection)

By Yin Zhou, Oncel Tuzel

尹周(Oncel Tuzel)

摘要— (Abstract —)

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird’s eye view projection. In this work, we remove the need for manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to an RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR-based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on the only LiDAR.

在3D点云中对象的准确检测是许多应用程序中的核心问题，例如自主导航，客房服务机器人和增强/虚拟现实。为了将高度稀疏的LiDAR点云与区域建议网络(RPN)连接，大多数现有的工作都集中在手工制作的特征表示上，例如鸟瞰图。在这项工作中，我们消除了对3D点云进行手动特征工程的需要，并提出了VoxelNet，这是一种通用3D检测网络，它将特征提取和边界框预测统一为一个单阶段，端到端的可训练深度网络。具体来说，VoxelNet将点云划分为等距的3D体素，并通过新引入的体素特征编码(VFE)层将每个体素内的一组点转换为统一的特征表示。以这种方式，将点云编码为描述性体积表示，然后将其连接到RPN以生成检测。 KITTI汽车检测基准测试表明，VoxelNet在很大程度上优于基于LiDAR的最新3D检测方法。此外，我们的网络能够学习有效地区分具有各种几何形状的物体，从而基于唯一的LiDAR，在行人和骑车人的3D检测中获得令人鼓舞的结果。

Paper can be found here :

纸可以在这里找到：

https://arxiv.org/pdf/1711.06396v1.pdf

Code can be found here :

代码可以在这里找到：

使用CNN检测艺术品中的人物 (Detecting People in Artwork with CNNs)

By Nicholas Westlake, Hongping Cai, Peter Hall

尼古拉斯·西湖(Nicholas Westlake)，蔡洪平，彼得·霍尔(Peter Hall)

摘要— (Abstract —)

CNNs have massively improved performance in object detection in photographs. However, research into object detection in artwork remains limited. We show state-of-the-art performance on a challenging dataset, People-Art, which contains people from photos, cartoons, and 41 different artwork movements. We achieve this high performance by fine-tuning a CNN for this task, thus also demonstrating that training CNNs on photos results in overfitting for photos: only the first three or four layers transfer from photos to the artwork. Although the CNN’s performance is the highest yet, it remains less than 60% AP, suggesting further work is needed for the cross-depiction problem.

CNN大大提高了照片中物体检测的性能。但是，对艺术品中的物体检测的研究仍然有限。我们在具有挑战性的数据集People-Art上展示了最先进的性能，该数据集包含来自照片，卡通人物和41种不同艺术品运动的人物。我们通过为该任务微调CNN来实现这种高性能，因此还证明了对照片进行CNN训练会导致对照片的过度拟合：只有前三到四层从照片转移到艺术品。尽管CNN的性能是最高的，但AP仍然不到60％，这表明交叉描述问题需要进一步的工作。

Paper can be found here :

纸可以在这里找到：

https://arxiv.org/pdf/1610.08871v1.pdf

Code can be found here :

代码可以在这里找到：

References and credits —

参考和鸣谢—