[Charles系列] 1. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D

最新推荐文章于 2022-05-10 22:53:40 发布

Matt今年18岁

最新推荐文章于 2022-05-10 22:53:40 发布

阅读量375

点赞数

分类专栏： 3D物体检测文章标签： cnn image render

本文链接：https://blog.csdn.net/weixin_44698281/article/details/120727817

版权

4 篇文章 0 订阅

订阅专栏

Charles文章发表序列

viewpoint在3D检测里是指什么？
Intro中说：”features learned by task-specific supervision leads to much better task performance“指的是说CNN网络可以对于特定监督任务产生更好的效果——看两篇超经典论文《Rich feature hierarchies for accurate object detection and semantic segmentation》以及《Imagenet classification with deep convolutional neural networks》来讨论这个问题
Intro中：”We believe that 3D models have the potential to generate large number of images of high variation, which can be well exploited by deep CNN with a high learning capacity.“说的images of high variation是不是相当于CNN中各种卷积层生成各种resolution的图像？
第四部分Render for CNN System中说到：”To generate a deformed model from a seed model, we draw i.i.d samples from a Gaussian distribution for the translation vector of each control point.“ 这一块确实看不懂，有可能需要从文章的[30][31]两篇引用中寻找答案。
损失函数部分：”By substituting an exponential decay weight w.r.t viewpoint distance for the mis-classification indicator weight in the original soft-max loss, we explicitly encourage correlation among the viewpoint predictions of nearby views.“ 这块应该是在说，d(V, Vs)代替普通函数，这块就考虑到了距离问题，从而希望相邻视图的视点可以相互预测。

首先，这篇文章主要想做的事情是关于viewpoint estimation的。现在看来，似乎用CNN网络作为backbone是很正常的情况，但是本文章发于2015年，当时3D大多用SIFT算法提取特征，因此本文应该也算是蹭了蹭CNN热点。

从正文看起，文章主要解决的两个问题：①具有正确viewpoint标注的3D训练集数量太少。②缺少powerful且可以用于viewpoint估计的特征太少了。于是针对这两个问题，文章做出了解答。

一、针对于训练数据集数量少的问题

文章是采用了图像合成的方法，在related work部分作者指出2015年的时候就已经有不少Synthet Images用于训练了，但是基本类别都比较少，同时也不是放到CNN里训练，因此他这里有创新点。

具体合成方面：作者采用的是symmetric-preserving deformation方法，生成了large number of images of high variation, 生成高变化的图片目的是为了防止CNN过拟合。

在CNN方面，文章引用的是AlexNet，具体实现有所变化。

作者先用了几层卷积，然后再加上了几层全连接层专门用于给每一个类别做分类。损失函数如图

最终的实验效果：

1.joint detection：利用R-CNN做bounding box regression

2.Viewpoint Estimation

关注

专栏目录