使用逻辑回归识别手写数字

最新推荐文章于 2024-08-23 09:59:32 发布

weixin_26750481

最新推荐文章于 2024-08-23 09:59:32 发布

阅读量930

点赞数 1

文章标签： python 机器学习人工智能

原文链接：https://medium.com/@aiswarya1998m/recognition-of-hand-written-numbers-using-logistic-regression-4562eb6c2dc5

版权

在人工智能研讨会的第二部分，我们探讨了分类——逻辑回归，这是监督机器学习的一部分。与线性回归不同，逻辑回归用于离散输出的预测，如识别28X28像素的手写数字并分类为1到4。

摘要由CSDN通过智能技术生成

人工智能研讨会—第二部分 (AI Workshop — Part II)

In the second part of the workshop, we learnt about Classification, also called Logistic Regression that also comes under Supervised Machine Learning. The first part of the workshop focussed on Linear Regression.

在研讨会的第二部分中，我们了解了分类(也称为逻辑回归)，该分类也属于监督机器学习。研讨会的第一部分集中于线性回归。

Both regression and classification predict the output. Regression predicts the output in a continuous numerical range. Classification predicts the output in a discrete range. For example, predicting the temperature at a place based on weather report data is a regression problem. Whereas, predicting if the day’s weather comes under ‘Sunny’, ‘Cold’, ‘Windy’ or ‘Rainy’ categories is a Classification problem.

回归和分类都可以预测输出。回归可预测连续数值范围内的输出。分类可预测离散范围内的输出。例如，基于天气报告数据预测某个地方的温度是一个回归问题。然而，预测一天的天气是否属于“晴天”，“寒冷”，“有风”或“多雨”类别属于分类问题。

Here, we learnt to imply logistic regression to recognise a hand-written numerical character (of pixel size 28X28)and classify them to the corresponding number (1, 2, 3, 4).

在这里，我们学习了逻辑回归以识别手写数字字符(像素大小为28X28)并将其分类为相应的数字(1、2、3、4)。

1. Reading the data

1.读取数据

m number of data samples were used for the prediction model where each data sample is an image of pixel size 28X28 and each image is a hand-written number of white colour (pixel value = 255) on a black background (pixel value = 0). Here, m=3599. The array for each image comprises of the number (that has been written on the image) followed by pixel values of the image, i.e, it had 784+1 elements (1st element is the number written on the image, followed by the 28X28=784 pixel values of the image). The raw data is a matrix (3599 rows, 785 columns) read from a CSV file that comprised of pixel values for each image.

m个数据样本用于预测模型，其中每个数据样本是像素大小为28X28的图像，每个图像是黑色背景(像素值= 0)上的手写数量的白色(像素值= 255) 。在这里， m ＝ 3599。每个图像的数组由数字(已写在图像上)后跟该图像的像素值组成，即它具有784 + 1个元素(第一个元素是写在图像上的数字，后跟28X28 =图片的784个像素值)。原始数据是一个从CSV文件读取的矩阵(3599行，785列)，该矩阵包含每个图像的像素值。

2. Separating input (X) and output (Y)

2.分离输入(X)和输出(Y)

The first column was extracted as the expected output (Y) and the rest of the array elements (corresponding pixel values) were extracted as the input (X) used to train and test this prediction model.

提取第一列作为预期输出(Y)，并提取其余的数组元素(对应的像素值)作为用于训练和测试此预测模型的输入(X)。

Thus,

从而，

size(X) = (3599,1)

尺寸(X)=(3599,1)

size(Y) = (3599,784)

尺寸(Y)=(3599,784)

In this classification problem, the model is to predict the number written on an image. The output of this model is a number. Of all the m=3599 data samples, the number of unique classes was found. In this case, the classes were 1, 2, 3 and 4.

在这个分类问题中，模型是预测写在图像上的数字。该模型的输出是数字。在所有m = 3599数据样本中，发现了唯一类的数量。在这种情况下，类别为1、2、3和4。

3. Splitting the raw data for training and testing

3.拆分原始数据以进行培训和测试

75% of the data samples were used for training (m=2699) and the rest were used for testing (m=900). Here, there are 784 features. As seen in the case of Regression, the prediction model is

75％的数据样本用于训练( m = 2699)，其余的用于测试( m = 900)。在这里，有784个功能。如回归所示，预测模型为

Eq(1): f(X) = X*Transpose (θ)

式(1)：f(X)= X *转置(θ)

Here,

这里，

Eq(2): f(X) = θ0 + x1*θ1 + x2*θ2 + … + x784*θ784

式(2)：f(X)=θ0+ x1 *θ1+ x2 *θ2+…+ x784 *θ784

where,

哪里，

Eq(3): X = [1, x1, x2, … xn], Size(x) = 1 row, n+1 columns (n=784)

式(3)：X = [1，x1，x2，…xn]，Size(x)= 1行，n + 1列(n = 784)

Eq(4): θ = [θ0, θ1, θ2, … θn], Size(θ) = 1 row, n+1 columns (n=784)

式(4)：θ= [θ0，θ1，θ2，…θn]，Size(θ)= 1行，n + 1列(n = 784)

4. Logistic function, g(z) — Sigmoid function

4.逻辑函数，g(z)-S形函数

The error cost function in the case of Linear Regression was

在线性回归的情况下，误差成本函数为

Eq(5): J = (1/2m)*(f(x)-y)²

式(5)：J =(1 / 2m)*(f(x)-y)²

This returns a value between 0 to 1. The objective of the error cost function is to measure the difference between the predicted output and original output and return the difference between 0 to 1. For this purpose, the logistic function used here is a sigmoid function g(z).

这将返回0到1之间的值。误差成本函数的目的是测量预测输出和原始输出之间的差，并返回0到1之间的差。为此，此处使用的逻辑函数是S型函数g(z)。

Eq(6): g(z) = 1 / ( 1 + e^-z )

式(6)：g(z)= 1 /(1 + e ^ -z)

The sigmoid function was coded as

乙状结肠功能编码为

An intermediate variable z can be defined as

中间变量z可以定义为

Eq(7): z = 1 + e^-x

式(7)：z = 1 + e ^ -x

5. Error function — J(θ)

5.误差函数— J(θ)

‘One versus many’ concept was used here. If the hand-written number is 2, then it should be compared with each class.

这里使用“一对多”的概念。如果手写数字为2，则应将其与每个班级进行比较。

When the expected output is “YES” (Y=1), the error ranges from 0 to ∞ when the predicted output ranges from 1 (YES) to 0 (NO) correspondingly. Mathematically, the error can be given as
当预期输出为“是” (Y = 1)时 ，当预测输出相应地为1(是)至0(否)时，误差范围为0至∞。从数学上讲，误差可以表示为

Eq(8): J(θ) = -log(g(z))

式(8)：J(θ)= -log(g(z))

when Expected output is “YES”

当预期输出为“是”时

When the expected output is “NO” (Y=0), the error ranges from 0 to ∞ when the predicted output ranges from 0 to 1 correspondingly. Mathematically, the error can be given as
当预期输出为“ NO ”(Y = 0)时 ，当预测输出对应范围为0到1时，误差范围为0到∞。从数学上讲，误差可以表示为

Eq(9): J(θ) = -log(1 - g(z))

式(9)：J(θ)= -log(1-g(z))

when Expected output is “NO”

当预期输出为“否”时

The error cost function is

误差成本函数为

Eq(10):

式(10)：

Generalising Eq(10) by taking Y into account,

通过考虑Y来推广等式(10)，

Eq(11): J(θ)= y *( -log(g(z)) ) + (1-y) * ( -log(1 — g(z)) )

式(11)：J(θ)= y *(-log(g(z)))+(1-y)*(-log(1 — g(z)))

Calculating the error for the dataset taking all the m entries into consideration

考虑所有m个条目为数据集计算误差

Eq(12):

式(12)：

Equation 12 is coded as

公式12编码为

6. Gradient

6.渐变

The gradient is

渐变是

Eq(13): Gradient = ∂J(θ)/∂θ

等式(13)：梯度=∂J(θ)/∂θ

Eq(14): Gradient = ( ∂J(θ)/∂z ) * ( ∂z/∂θ )

等式(14)：梯度=(∂J(θ)/∂z)*(∂z/∂θ)

Eq(15):

式(15)：

Eq(16): Gradient = x*(g(x*Transpose(θ)) — y)

式(16)：梯度= x *(g(x * Transpose(θ))— y)

Eq(17): Gradient = x*Error

式(17)：梯度= x *误差

Finding the gradient for all the m values and including the learning rate c, the final gradient becomes

找到所有m个值的梯度并包括学习率c ，最终梯度变为

Eq(18):

式(18)：

For each class, the gradient is minimised, during which the 785 parameters are optimised and stored in new_theta matrix.

对于每个类别，将梯度降至最低，在此期间，将优化785个参数并将其存储在new_theta矩阵中。

7. Real-time prediction

7.实时预测

An image of pixel size 28X28 is taken as the input. If it is a black coloured text written on a white background, the image is inverted. The image is converted to a row matrix x, inserted with a Bias term. For each class, the prediction output is given as

将像素大小为28X28的图像作为输入。如果它是写在白色背景上的黑色文本，则图像会反转。图像将转换为行矩阵x ，并插入Bias项。对于每个类别，预测输出为

[1] prediction = sigmoid(x*new_theta.T)

[1]预测= Sigmoid(x * new_theta.T)

# where new_theta.T means transpose of new_theta

＃其中new_theta.T表示转置new_theta

prediction represents the probabilities of matching between the number written in x and the class. The class with the maximum confidence or probability of matching is chosen as the predicted output. The predicted results were compared with the original results to calculate the accuracy.

预测表示用x编写的数字与类别之间匹配的概率。选择具有最大置信度或匹配概率的类别作为预测输出。将预测结果与原始结果进行比较以计算准确性。

面临的挑战 (Challenges faced)

Logistic regression is more complicated than Linear Regression. It was easier to code after working out the math for the error function J(θ), gradient (∂J(θ)/∂z) and then coding them.
逻辑回归比线性回归更复杂。在对误差函数J(θ) ，梯度(∂J(θ)/∂z )进行数学计算后再进行编码后，编码起来会更容易。
If the image is coloured, it needs to be converted to grayscale, converted to a binary image (black and white) and ensured that the character is white on a black background, before subjecting it to prediction.
如果图像是彩色的，则在对其进行预测之前，需要将其转换为灰度，转换为二进制图像(黑白)并确保字符在黑色背景上为白色。