1. Computer Vision Task
2. Semantic Segmentation
2.1 特点:
a. Label each pixel in the image with a category label
b. Don’t differentiate instances, only care about pixels
2.2 步骤:
a. Semantic Segmentation Idea: Sliding Window
b. Semantic Segmentation Idea: Fully Convolutional
2.3 upsampling:
Max Unpooling”
这样的upsamle有效的原因在于算法不要求得到一张好看的超分辨率图片,而是为了尽可能的保留像素的结构分布特征!
Transpose Convolution
算法原理图:
1D Example:
3. Classification + Localization
原理图:
Human Pose Estimation
目标:
原理图:
4. Object Detection as Classification
搜索算法Sliding Window存在的问题:
Region Proposals:
RNN算法原理:
R-CNN: Problems
- Ad hoc training objectives
• Fine-tune network with softmax classifier (log loss)
• Train post-hoc linear SVMs (hinge loss)
• Train post-hoc bounding-box regressions (least squares) - Training is slow (84h), takes a lot of disk space
- Inference (detection) is slow
• 47s / image with VGG16 [Simonyan & Zisserman. ICLR15]
• Fixed by SPP-net [He et al. ECCV14]
Fast R-CNN:
检测ROI区域在得到图像特征图之后,从而减少大量的重复特征计算。
”
”
Faster R-CNN: RoI Pooling
”
在卷积层中设置RPN层用于检测ROI:
”
Detection without Proposals: YOLO / SSD
扫描一次图片时同时进行区域定位与物体识别:
”
Object Detection: Lots of variables …
”
Aside: Object Detection + Captioning = Dense Captioning
”
算法架构:
”
Mask R-CNN
加入一个掩摸:
”
”
Mask R-CNN Also does pose:
”
效果图:
”