CS231n
Lecture 11: Detection and Segmentation
Semantic Segmentation
Label each pixel in the image with a category label
Not differentiate instances, only care about pixels
最简单的想法:Sliding window,但是very inefficient
更好的方法:Fully Convolutional,但是也有问题:convolutions at original image resolution will be very expensive
解决方案:downsampling and upsampling inside the network
Upsampling: Transpose Convolution,之所以叫这个名字是因为从卷积的矩阵表示来看,这种操作就是把原kernel矩阵进行转置
Classification + Localization
对图像中的单个物体(默认图像中只有一个主要物体)进行分类和定位(图像中的物体不充满整张图像)
特征提取⇒{分类回归boundingbox
特
征
提
取
⇒
{
分
类
回
归
b
o
u
n
d
i
n
g
b
o
x
然后loss相加
Human Pose Estimation
Object Detection
Sliding window
very computationally expensive
Region Proposals
Find “blobby” image regions that are likely to contain objects
R-CNN
Fast R-CNN
sharing features
Faster R-CNN
RPN
Detection without Proposals
YOLO / SSD
Dense Captioning
Object Detection + Captioning
Instance Segmentation
Mask R-CNN