为什么逻辑回归可以识别数字

最新推荐文章于 2022-06-11 23:54:14 发布

Ghostkkkk

最新推荐文章于 2022-06-11 23:54:14 发布

阅读量175

点赞数

分类专栏： Python 机器学习

本文链接：https://blog.csdn.net/c6376315qqso/article/details/104367554

版权

机器学习同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

Python

2 篇文章 0 订阅

订阅专栏

https://stats.stackexchange.com/questions/426873/how-does-a-simple-logistic-regression-model-achieve-a-92-classification-accurac

问题：

Even though all the images in the MNIST dataset are centered, with a similar scale, and face up with no rotations, they have a significant handwriting variation that puzzles me how a linear model achieves such a high classification accuracy.

As far as I am able to visualize, given the significant handwriting variation, the digits should be linearly inseparable in a 784 dimensional space, i.e., there should be a little complex (though not very complex) non-linear boundary that separates the different digits, similar to the well-cited XORXOR example where positive and negative classes can not be separated by any linear classifier. It seems baffling to me how multi-class logistic regression produces such a high accuracy with entirely linear features (no polynomial features).

As an example, given any pixel in the image, different handwritten variations of the digits 22 and 33 can make that pixel illuminated or not. Therefore, with a set of learned weights, each pixel can make a digit look as a 22 as well as a 33. Only with a combination of pixel values should it be possible to say whether a digit is a 22 or a 33. This is true for most of the digit pairs. So, how is logistic regression, which blindly bases its decision independently on all pixel values (without considering any inter-pixel dependencies at all), able to achieve such high accuracies.

I know that I am wrong somewhere or am just over-estimating the variation in the images. However, it would be great if someone could help me with an intuition on how the digits are 'almost' linearly separable.

解答：

This is a very interesting question and thanks to the simplicity of logistic regression you can actually find out the answer.

What logistic regression does is for each image accept 784784 inputs and multiply them with weights to generate its prediction. The interesting thing is that due to the direct mapping between input and output (i.e. no hidden layer), the value of each weight corresponds to how much each one of the 784784 inputs are taken into account when computing the probability of each class. Now, by taking the weights for each class and reshaping them into 28×2828×28 (i.e. the image resolution), we can tell what pixels are most important for the computation of each class.

Note, again, that these are the weights.

Now take a look at the above image and focus on the first two digits (i.e. zero and one). Blue weights mean that this pixel's intensity contributes a lot for that class and red values mean that it contributes negatively.

Now imagine, how does a person draw a 00? He draws a circular shape that's empty in between. That's exactly what the weights picked up on. In fact if someone draws the middle of the image, it counts negatively as a zero. So to recognize zeros you don't need some sophisticated filters and high-level features. You can just look at the drawn pixel locations and judge according to this.

Same thing for the 11. It always has a straight vertical line in the middle of the image. All else counts negatively.

The rest of the digits are a bit more complicated, but with little imaginations you can see the 22, the 33, the 77 and the 88. The rest of the numbers are a bit more difficult, which is what actually limits the logistic regression from reaching the high-90s.

Through this you can see that logistic regression has a very good chance of getting a lot of images right and that's why it scores so high.

Ghostkkkk

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
为什么逻辑回归可以识别数字

https://stats.stackexchange.com/questions/426873/how-does-a-simple-logistic-regression-model-achieve-a-92-classification-accurac
复制链接

扫一扫

专栏目录