为什么逻辑回归可以识别数字

https://stats.stackexchange.com/questions/426873/how-does-a-simple-logistic-regression-model-achieve-a-92-classification-accurac

问题:

Even though all the images in the MNIST dataset are centered, with a similar scale, and face up with no rotations, they have a significant handwriting variation that puzzles me how a linear model achieves such a high classification accuracy.

As far as I am able to visualize, given the significant handwriting variation, the digits should be linearly inseparable in a 784 dimensional space, i.e., there should be a little complex (though not very complex) non-linear boundary that separates the different digits, similar to the well-cited XORXOR example where positive and negative classes can not be separated by any linear classifier. It seems baffling to me how multi-class logistic regression produces such a high accuracy with entirely linear features (no polynomial features).

As an example, given any pixel in the image, different handwritten variations of the digits 22 and 33 can make that pixel illuminated or not. Therefore, with a set of learned weights, each pixel can make a digit look as a 22 as well as a 33. Only with a combination of pixel values should it be possible to say whether a digit is a 22 or a 33. This is true for most of the digit pairs. So, how is logistic regression, which blindly bases its decision independently on all pixel values (without considering any inter-pixel dependencies at all), able to achieve such high accuracies.

I know that I am wrong somewhere or am just over-estimating the variation in the images. However, it would be great if someone could help me with an intuition on how the digits are 'almost' linearly separable.

 

解答:

This is a very interesting question and thanks to the simplicity of logistic regression you can actually find out the answer.

What logistic regression does is for each image accept 784784 inputs and multiply them with weights to generate its prediction. The interesting thing is that due to the direct mapping between input and output (i.e. no hidden layer), the value of each weight corresponds to how much each one of the 784784 inputs are taken into account when computing the probability of each class. Now, by taking the weights for each class and reshaping them into 28×2828×28 (i.e. the image resolution), we can tell what pixels are most important for the computation of each class.

 

 

Note, again, that these are the weights.

Now take a look at the above image and focus on the first two digits (i.e. zero and one). Blue weights mean that this pixel's intensity contributes a lot for that class and red values mean that it contributes negatively.

Now imagine, how does a person draw a 00? He draws a circular shape that's empty in between. That's exactly what the weights picked up on. In fact if someone draws the middle of the image, it counts negatively as a zero. So to recognize zeros you don't need some sophisticated filters and high-level features. You can just look at the drawn pixel locations and judge according to this.

Same thing for the 11. It always has a straight vertical line in the middle of the image. All else counts negatively.

The rest of the digits are a bit more complicated, but with little imaginations you can see the 22, the 33, the 77 and the 88. The rest of the numbers are a bit more difficult, which is what actually limits the logistic regression from reaching the high-90s.

Through this you can see that logistic regression has a very good chance of getting a lot of images right and that's why it scores so high.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值