计算机视觉的小结,推荐|Andrew Ng计算机视觉教程总结

原标题:推荐|Andrew Ng计算机视觉教程总结

Computer Vision by Andrew Ng—11Lessons Learned

0ad38ef05a55f7ef654ead05ff9288c0.png

I recently completed Andrew Ng’s computer vision course on Coursera.Ng does an excellent job at explaining many of the complex ideasrequired to optimize any computer vision task. My favourite componentof the course was the neural style transfer section (see lesson 11),which allows you to create artwork which combines the style of ClaudMonet with the content of whichever image you would like. This is anexample of what you can do:

7ec2d2bcbf48bb29b6d6c96624de2ac1.png

In this article, I will discuss 11 key lessons that I learned in the course.Note that this is the fourth course in the Deep Learning specializationreleased by deeplearning.ai. If you would like to learn about theprevious 3 courses, I recommend you check out this blog.

Lesson 1: Why computer vision is taking o?

Big data and algorithmic developments will cause the testing error ofintelligent systems to converge to Bayes optimal error. This will lead toCreated in week 4 of the course. Combined Ng’s face with the style of Rain Princess byLeonid Afremov.AI surpassing human level performance in all areas, including naturalperception tasks. Open source software from TensorFlow allows you touse transfer learning to implement an object detection system for anyobject very rapidly. With transfer learning, you only need about 100–500 examples for the system to work relatively well. Manually labeling100 examples isn’t too much work, so you’ll have a minimum viableproduct very quickly.

Lesson 2: How convolution works?

Ng explains how to implement the convolution operator and showshow it can detect edges in an image. He also explains other lters,suchas the Sobel lter,which put more weight on central pixels of the edge.Ng then explains that the weights of the ltershould not be handdesignedbut rather should be learned using a hill climbing algorithmsuch as gradient descent.Lesson 3: Why convolutions?Ng gives several philosophical reasons for why convolutions work sowell in image recognition tasks. He outlines 2 concrete reasons. Therstis known as parameter sharing. It is the idea that a feature detectorthat’s useful in one part of an image is probably useful in another partof the image. For example, an edge detector is probably useful is manyparts of the image. The sharing of parameters allows the number ofparameters to be small and also allows for robust translationinvariance. Translation invariance is the notion that a cat shifted androtated is still a picture of a cat.

The second idea he outlines is known as sparsity of connections. This isthe idea that each output layer is only a function of a small number ofinputs (particularly, the ltersize squared). This greatly reduces thenumber of parameters in the network and allows for faster training.

Lesson 3: Why Padding?

Padding is usually used to preserve the input size (i.e. the dimension ofthe input and output are the same). It is also used so that frames nearthe edges of image contribute as much to the output as frames nearnear the centre.

Lesson 4: Why Max Pooling?

Through empirical research, max pooling has proven to be extremelyeectivein CNN’s. By downsampling the image, we reduce the numberof parameters which makes the features invariant to scale ororientation changes.

Lesson 5: Classical network architectures

Ng shows 3 classical network architectures including LeNet-5, AlexNetand VGG-16. The main idea he presents is that eectivenetworks oftenhave layers with an increasing channel size and decreasing width andheight.

Lesson 6: Why ResNets works?

For a plain network, the training error does not monotonically decreaseas the number of layers increases due to vanishing and explodinggradients. These networks have feed forward skipped connectionswhich allow you train extremely large networks without a drop inperformance.

b16c62b4e7dbd2ba4a80ec86960295ba.png

Lesson 7: Use Transfer Learning!

Training large networks, such as inception, from scratch can take weekson a GPU. You should download the weights from a pretrained networkand just retrain the last softmax layer (or the last few layers). This willgreatly reduce training time. The reason this works is that earlier layerstend to be associated with concepts in all images such as edges andcurvy lines.

Lesson 8: How to win computer vision competitions

Ng explains that you should train several networks independently andaverage their outputs to get better performance. Data augmentationtechniques such as randomly cropping images, ippingimages aboutthe horizontal and vertical axes may also help with performance.Finally, you should use an open source implementation and pretrainedmodel to start and then ne-tunethe parameters for your particularapplication.

Lesson 9: How to implement object detection

Ng starts by explaining the idea of landmark detection in an image.Basically, these landmarks become apart of your training outputexamples. With some clever convolution manipulations, you get anoutput volume that tells you the probability that the object is in acertain region and the location of the object. He also explains how toevaluate the eectivenessof your object detection algorithm using theintersection over union formula. Finally, Ng puts all these componentstogether to explain the famous YOLO algorithm.

Lesson 10: How to implement Face Recognition

Facial recognition is a one-shot learning problem since you may onlyhave one example image to identify the person. The solution is to learna similarity function which gives the degree of dierencebetween twoimages. So if the images are of the same person, you want the functionto output a small number, and vice versa for dierentpeople.

The rstsolution Ng gives is known as a siamese network. The idea is toinput two persons into the same network separately and then comparetheir outputs. If the outputs are similar, then the persons are probablythe same. The network is trained so that if two input images are of thesame person, then the distance between their encodings is relativelysmall.

The second solution he gives uses a triplet loss method. The idea is thatyou have a triplet of images (Anchor (A), Positive (P) and Negative(N)) and you train the network so that the output distance between Aand P is much smaller than the distance between A and N.

477f9aff2b38a166a7c9da476e17cdbe.png

Lesson 11: How to create artwork using NeuralStyle Transfer

Ng explains how to generate an image with a combining content andstyle. See the examples below.

d2ba56e74a54314073fd045ab690c96b.png

The key to Neural Style Transfer is to understand the visualrepresentations for what each layer in a convolutional network islearning. It turns out that earlier layers learn simple features like edgesand later features learn complex objects like faces, feet and cars.

To build a neural style transfer image, you simply denea cost functionwhich is a convex combination of the similarity in content and style. Inparticular, the cost function would be:

J(G) = alpha * J_content(C,G) + beta * J_style(S,G)

where G is the generated image, C is the content image and S is thestyle image. The learning algorithm simply uses gradient descent tominimize the cost function with respect to the generated image, G.

The steps are as follows:

Generate G randomly.

Use gradient descent to minimize J(G), i.e. write G := GdG(J(G)).

Repeat step 2.

Conclusion

By completing this course, you will gain an intuitive understanding of alarge chunk of the computer vision literature. The homework1.2.3.assignments also give you practice implementing these ideas yourself.You will not become an expert in computer vision after completing thiscourse, but this course may kickstart a potential idea/career you mayhave in computer vision.

If you have any interesting applications of computer vision you wouldlike to share, let me know in the comments below. I would be happy todiscuss potential collaboration on new projects.

That’s all folks—if you’ve made it this far, please comment below andadd me on LinkedIn. https://www.linkedin.com/in/ryanshrott/

Github :https://github.com/ryanshrott返回搜狐,查看更多

责任编辑:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值