Application example: Photo OCR - Ceiling analysis: What part of the pipeline to work on next

最新推荐文章于 2024-07-19 15:52:54 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2024-07-19 15:52:54 发布

阅读量128

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/116649064

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

When developing machine learning system, one of the most valuable resource is your time as the developer in terms of picking what to work on next. What you really want to avoid is that spent a lot of time working on some component only to realize, after weeks or months of time spent, that all that work just doesn't make a huge difference on the performance of the final system. In this class, we'll talk about something called ceiling analysis. When you and your team are working on a pipeline machine learning system, this can sometimes give you a very strong signal/guidance on what parts of the pipeline might be the best use of your time to work on.

To talk about ceiling analysis, I'm going to keep on using the example of the photo OCR pipeline. As figure-1, each of the boxes/components can have a small engineering team working on it, or maybe the entire system is just built by you. The question is which of these boxes is most worth your efforts trying to improve the performance of?

In order to make decisions for what to do for developing the system, it is going to be helpful to have a single real number evaluation metric for this learning system. Let's say we pick the characters level accuracy. It means given a test set image, what is the fraction of characters in the test set image we recognize correctly? Let's say we find that the overall system currently has 72% accuracy. The idea behind ceiling analysis is:

We're going to go to say the first component of a machine learning pipeline, say text detection. We're going to simulate what happens if we have a text detection system with a 100% accuracy. For this, we'll manually label where the text is in each of the test examples. Then, we'll use these ground true labels as what you feed into the next stage of the pipeline. Then run this data through the rest of the pipeline and measure what is the overall accuracy of the entire system. Hopefully, the performance goes up to say 89%.
Next, let's go to the next stage Character segmentation of pipeline. Now, we'll give the correct text detection as well as the correct character segmentation outputs by manually label the correct segmentations of text into individual characters. And let's say the overall system accuracy goes up to 90%
Finally, we go to Character recognition and give that the correct lables as well. No surprise that we should get 100% accuracy.

Having done such analysis, we can now understand what is the upside potential for improving each of these components.

If we get perfect text detection, our performance went up from 72% to 89%, that is a 17% performance gain.
Whileas in contrast, when we gave it perfect character segmentation, performance only went up by 1%.
If we get better character recognition, the performance went up by 10%.

Based on this, maybe you want to spend more time on text detection and character recognition to improve the overall system performance. And no matter how much time you spent on character segmentation, the upside potential is going to be pretty small. And so maybe you don't want to have a large team of engineers spend more time working on character segmentation.

Let's see another different but more complex example. Let's say we want to do face recognition from images. We want to look at the picture in figure-2 and recognize whether or not the person in this picture is a particular friend of yours. Note that this is slightly artificial example. This isn't actually how face recognition is done in practice. We'll just step through an example of what a pipeline might look like to give you another example of how a ceiling analysis process might look. So, following figure-3 shows the possible pipeline for this.

So, we have a camera image, and:

The first thing is do pre-processing of the image to remove the background as figure-4

Next, we want to detect say the face of the person. This is usually done with a learning algorithm. We'll run a sliding window classifier to draw a box around the person's face

Having detected the face, if you want to recognize people, it turns out that the eyes is a highly useful cue. So we can run another classifier to detect the eyes of the person. So we can segment out the eyes. In addition, other parts that may be helpful for recognizing people and can be segmented out are the nose, the mouth. All these give us useful features to maybe feed into a logistic regression classifier. Then this classifier give us the overall label for who we think is the identity of this person.

It's actually probably more complicated than you should be using if you actually want to recognize people. This is just an illustrative example for ceiling analysis.

So how do you go through ceiling analysis for this pipeline?

We'll step through these pieces one at a time. Let's say your system has 85% accuracy.

The first thing I'll do is go to my test set and manually give it a ground true forground & background segmentation and see how much the accuracy changes. You can do this use photoshop or similar tools. In this example, let's say the accuracy went up by 0.1%. This is a strong sign that even if you have a perfect background segmentation, the performance of your system isn't going to go up that much. So maybe not worth a huge effort to work on background removal.
Next, we can go through the system and just give more and more components the correct labels in the test set. That is, we can then give correct face detection images, give the correct location of the eyes, nose and mouth. And finally give the correct overall label. And then we get the overall system accuracy change as figure-7. Then you can look up how much the performance went up on different steps.

From giving it a perfect face detection, the overall performance of this system went up by 5.9%. So maybe it's worth quite a big effort on better face recognition. Went up 4% if we have a perfect eys segmentation; went up 1% if perfect nose segmentation; 1% growth if perfect mouth segmentation; 3% improvement if perfect logistic regression. So it seems like the components that most worth our while are face detection (5.9%), eyes segmentation (4%), logistic regression classifier (3%).

A true cautionary story: there was a research team that actually literally had two people. They spent a year and a half working on better background removal. There was a computer vision application on which they were working on. Actually they worked out really complicated algorithms and ended up publishing one research paper. But after all that work, they found that it just did not make a huge difference to the overall performance of the actual application they were working on. If only someone were to do a ceiling analysis beforehand, maybe they could have realized this.

Pipelines are pretty pervasive and complex machine learning applications. Don't work on something that ultimately isn't going to matter. Ceiling analysis is a very good tool for identifying the components that are worth your while and would make big difference for the performance of your final system. Don't trust your gut feeling about what components should be worked on!

<end>