Segmentation, Localization, Detection
Semantic Segmentation
- label each pixel in the image with a category label
- know classes
- idea: sliding window
- inefficient, Not reusing shared features between overlapping patches
- idea: fully convolutional network (HxWxC)
- whole image is computational heavy
- downsampling and upsampling inside the network
sampling
downsampling
- pooling
- stride conv
upsampling
- unpooling
- NN(duplicate)
- Bed of Nails(0s padding)
- max: use positions from pooling layer
- transpose convolution
- sum where output overlaps
- eg: input: [a b c d], filter(weight): [x y z]
conclusion
FCN + dilated conv + CRF
Classification + Localization
- single object
- model as regression problem
- train seperately, when converge, then train jointly
- Aside
- Human Pose Estimation
Object Detection
- various object
- PASCAL VOC(too easy now)
- as localization
- fail, number of objects doesn’t fix
- as classification:
- sliding window
- add class: bcakground
- need to apply CNN to huge number of locations and scales
Region Based
- Region Proposals
- find “blobby” image regions that are likely to contain objects
- Selective Search, fast
- RCNN
- Region Proposal + CNN
- multi-task: also predict(correct) proposes bounding box
- supervised
- Fast R-CNN
- RP on feature map(after convolved)
- ROL pooling to crop image
- Faster R-CNN
- Region Proposal Network(RPN) to predict proposals(no ground truth)
YOLO/ SSD
- “You Only Look Once”/ “Single-shot MultiBox Detector”
- treat as a regression problem
- diff: bouding box are fixed, RCNN = region proposal + classification. SSD combine this.
Takeaways
- Faster R-CNN is slower but more accurate
- SSD is much faster but not as accurate
Instance Segmentation
Mask R-CNN
- trained on Coco
- 200000 training images
- 80 categories per images
To Read
- region proposal
- ROI align / POOL
Thinks
- multi task
- end to end