python scikit_使用scikit学习识别python中的手写数字

python scikit

In this article, I will let you know about how can we use scikit-learn to do machine learning classification on Digits dataset of handwritten digits. You can use any of the dataset for handwritten recognition but here I have used digits dataset. There is no need to download the dataset externally in your PC. I will show you how you can download it using scikit-learn.

在本文中,我将向您介绍如何使用scikit-learn对手写数字的Digits数据集进行机器学习分类。 您可以使用任何数据集进行手写识别,但是这里我使用了数字数据集。 无需从PC外部下载数据集。 我将向您展示如何使用scikit-learn下载它。

Let’s start by loading the dataset. The code below will load the digits dataset into your PC.

让我们从加载数据集开始。 下面的代码会将数字数据集加载到您的PC中。

Now as we have loaded the dataset, let’s see how many images and how many labels are there in the dataset.

现在,当我们加载数据集时,让我们看看数据集中有多少个图像和多少个标签。

Image for post
From the image we got to know that there are 1797 images and 1797 labels in the dataset.
从图像中我们知道数据集中有1797张图像和1797个标签。

现在显示图像和标签 (Now showing the images and labels)

Image for post
Visualizing the images and labels in our dataset.
可视化我们数据集中的图像和标签。

Now let’s split into training and testing dataset. The main purpose of training and testing dataset is to make sure that after we train our model, it is able to generalize well to new data.

现在让我们分为训练和测试数据集。 训练和测试数据集的主要目的是确保在训练模型后,它能够很好地推广到新数据。

Image for post

Now we are using Logistic Regression to train our model. So first we have to import the model.

现在,我们正在使用Logistic回归来训练我们的模型。 因此,首先我们必须导入模型。

Image for post

Let’s make an instance of the model.

让我们做一个模型实例。

Image for post
Making an instance of the model
制作模型的实例

Now let’s train the model on the data and store the information learned from the data.

现在,让我们在数据上训练模型并存储从数据中学到的信息。

Image for post

Now let’s try to predict the labels of new data using the information we have gained from training the model.

现在,让我们尝试使用从训练模型中获得的信息来预测新数据的标签。

Image for post

It’s time to measure the performance of the model, there are various ways to measure the performance of the model but I am using the simple one and using accuracy as our metric. Now, let’s try to understand what is accuracy :-

现在是衡量模型性能的时候了,有多种方法可以衡量模型的性能,但是我使用的是一种简单的方法,并使用准确性作为衡量指标。 现在,让我们尝试了解什么是准确性:-

Accuracy is defined as :

精度定义为:

(fraction of correct predictions): correct predictions / total number of data points.

(正确预测的分数):正确预测/数据点总数。

Image for post
In my case the accuracy is around 96.44%.
就我而言,准确度约为96.44%。

Let’s find out the confusion matrix as well. Confusion matrix is a table that is used to describe the performance of the model, on a set of test data for which the true values are known. I am showing the confusion matrix using two methods or we can say using two python packages (Seaborn and Matplotlib).

让我们也找出混淆矩阵。 混淆矩阵是用于描述模型的性能的表,该表基于一组已知真实值的测试数据。 我正在使用两种方法显示混乱矩阵,或者可以说使用两个python软件包(Seaborn和Matplotlib)。

Before forming a confusion matrix let’s import the necessary packages in python using the following :-

在形成混淆矩阵之前,让我们使用以下命令在python中导入必要的包:

Image for post

Let’s try to form the confusion matrix using seaborn.

让我们尝试使用seaborn来形成混淆矩阵。

Image for post
Image for post
Here you can see the accuracy score.
在这里您可以看到准确性得分。
Image for post
This is the required confusion matrix using Seaborn.
这是使用Seaborn所需的混淆矩阵。

Now let’s form the confusion matrix using Matplotlib.

现在,让我们使用Matplotlib形成混淆矩阵。

Image for post
This is the required code to form Confusion matrix using Matplotlib.
这是使用Matplotlib形成混淆矩阵所需的代码。
Image for post
This is the required confusion matrix formed using Matplotlib.
这是使用Matplotlib形成的必需混淆矩阵。

Till now we have predicted using 75% of the training set and 25% of the testing set, and for that split we have got the accuracy around 96.44%.

到现在为止,我们已经预测将使用75%的训练集和25%的测试集,并且对于该划分,我们获得了大约96.44%的准确性。

Let’s try to find out the accuracy in the case of 70% training set and 30% testing set and also in 80% training set and 20% of testing set.

让我们尝试找出在70%训练集和30%测试集以及80%训练集和20%测试集的情况下的准确性。

Now starting with the case of 80% training set and 20% testing set.

现在从80%的训练集和20%的测试集开始。

Image for post

Now as we have already created the instance of the Logistic Regression and also we have already imported the module and necessary package needed so no need to do it again and again. It’s time to fit the Logistic Regression into training model.

现在,我们已经创建了Logistic回归的实例,并且已经导入了所需的模块和必要的包,因此无需一次又一次地进行操作。 现在是时候将Logistic回归纳入训练模型了。

Image for post

It’s time to predict :-

现在是时候预测:-

Image for post

Now let’s look in to the accuracy we got using 80% training and 20% testing set.

现在,让我们看看使用80%训练和20%测试集所获得的准确性。

Image for post
We’ve got the accuracy around 96.94%.
我们的准确度约为96.94%。

Now this time I will form the confusion matrix only using Seaborn.

现在,这一次我将仅使用Seaborn来形成混淆矩阵。

Image for post
Image for post
Here we got 96.94% same as we have discussed earlier.
在这里,我们得到了与前面讨论的相同的96.94%。
Image for post
Here is the confusion matrix we have formed using seaborn.
这是我们使用seaborn形成的混淆矩阵。

Though we can form the confusion matrix using Matplotlib as well, as we have discussed earlier.

尽管我们也可以使用Matplotlib来形成混淆矩阵,但是正如我们之前讨论的那样。

Till now we have found the accuracy in 75% training set and 25% testing set, and just now we have found the accuracy in 80% training set and 20% testing set. Let’s now take the case of 70% training set and 30% testing set.

到目前为止,我们已经找到了75%的训练集和25%的测试集的准确性,而现在我们已经找到了80%的训练集和20%测试集的准确性。 现在,以70%的训练集和30%的测试集为例。

Now starting with the case of 70% training set and 30% testing set.

现在从70%的训练集和30%的测试集开始。

Image for post

After this we will do the same like we have done before.

此后,我们将像以前一样进行操作。

Image for post

It’s time to predict :

现在可以预测:

Image for post

Now let’s look into the accuracy we got from splitting into 70% training set and 30% testing set.

现在,让我们看一下分成70%训练集和30%测试集所获得的准确性。

Image for post
Accuracy is around 96.29%
准确度约为96.29%

Now I will again form the confusion matrix using Seaborn and for that let’s load some of the libraries though it is not necessary to import it each time once imported it will go on till the time kernel is ready.

现在,我将再次使用Seaborn形成混淆矩阵,为此,让我们加载一些库,尽管不必每次导入后都将其导入,直到内核准备就绪为止。

Image for post
Image for post
This is the confusion matrix of 70–30 train test split.
这是70–30火车测试拆分的混淆矩阵。

总结思想 (Closing Thoughts)

In this article we have used the scikit-learn for Machine Learning Classification. Though it doesn’t need a lot to memorize or something like that, if you are regular user you will be fond of it. And please let me know if you are stuck in between. I will definitely look into your problem.

在本文中,我们将scikit-learn用于机器学习分类。 尽管不需要太多记忆或类似的操作,但是如果您是普通用户,您一定会喜欢它。 并且请让我知道您是否介于两者之间。 我一定会调查您的问题。

Thank you so much for reading this article.

非常感谢您阅读本文。

You can view the source code of this from GitHub and for that click here.

您可以从GitHub查看其源代码,并单击此处

翻译自: https://medium.com/@yash.dlh12/recognizing-handwritten-digits-in-python-using-scikit-learn-5714567e331e

python scikit

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值