100 days of ML code----挑战100天搞定机器学习（7--11）

最新推荐文章于 2020-07-19 00:57:50 发布

iamsrc

最新推荐文章于 2020-07-19 00:57:50 发布

阅读量870

点赞数

分类专栏：机器学习深度学习

本文链接：https://blog.csdn.net/iamsrc/article/details/81663808

版权

机器学习同时被 2 个专栏收录

6 篇文章 1 订阅

订阅专栏

深度学习

5 篇文章 0 订阅

订阅专栏

这是挑战100天搞定机器学习的第7天到第11天的翻译。这篇文章后可能会停更几天，因为有几天原作者给出的只是简单的一句话，我也想深入的了解理论知识。李宏毅教授的课程我也没有学到相应位置，泡沫积累太多，所以要沉淀一下了。我会很快回来的，2.0倍播放速度看视频不会错过什么，只能让你更专心。

原项目地址在这里，作者写的相当不错。

转载请注明出处。

第7天 K最临近算法

what is k-NN

什么是KNN

K-Nearest Neighbro algorithm is a simple yet most used classification algorithm.

KNN 算法是一种简单但最常用的分类算法

it can also be used for regression.

它也可以用作回归问题

KNN is non-parametric (means that it does not make any assumptions on the underlying data distribution),instance-based(means that our algorithm doesn’t explicitly learn a model. Instead, it chooses to memorize the training instances.)

and used in a supervised learning setting

KNN 没有参数（即它不对基础数据分布做出任何假设）、基于实例（即算法没有明确的学习模型，相反，它选择记忆训练实例）、用于监督学习环境。

we want to classify the grey point into one of the three classes light green, green and red.

我们想把灰色点分为浅绿色、绿色和红色三个类别中的一个。

start by calculating the distance between the grey point and K-nearest points

通过计算灰点与k最近点之间的距离开始

making predections

预测

to classify an unlabeled object, the distance of this object to the labeled objects is computed, its K-nearest neighors are identified, and the class label of the majority of nearset meighbors is then used to determine the class label of the object

为了对未标记对象进行分类，计算该对象与标记对象的距离，确定其k最近邻，然后使用大多数近邻邻居的类标签来确定对象的类标签。

for real-valued input variables, the most popular distance measure is Euclidean distance

对于实值输入变量，最常用的距离度量是Euclidean距离。

Value of k

k 的值

finding the value of not easy.

找到这个值不容易

a small value of k means than noise will have a higher influence on the result and a large value make it computationally expensive.

k值较小，噪声对结果的影响较大，而较大的k值使其计算成本较高。

It depend a lot on your individual cases, sometimes it is best to run through each possible value for k and decide for yourself.

它在很大程度上取决于你的，有时最好贯穿k的每个可能值并你自行决定。

K-NN is also called a lazy algorithm because it is instance based

K-NN也被称为懒人算法，因为它是基于实例的。

how does K-NN algorithm work

K-NN 的工作原理

k-NN when used used for classification—the output is a class membership (predicts a class –-- discrete value )

k-NN用于分类时 – 输出是类成员（预测一个类---离散值）

there are three key elements of this approach: a set of labeled objects, e.g, a set of stored records, and the value of k, the number of nearest neighbors

该方法有三个关键要素：一组标记对象，例如，一组存储的记录，

以及k的值，最近邻居的数量。

the distance

距离

Euclidean distance is calculated as the squared differences between a new

point and an existing point across all input attributes.

欧几里德距离计算为所有输入属性中新点与现有点之间的平方差。

Other popular distance measures include:

hamming distance

manhattan distance

munkowski distance

其他主流的距离计算包括：

汉明距离

曼哈顿距离

闵可夫斯基距离

第8 天逻辑回归的数学原理

/*译者注：

今天作者只给出了一个网址，说是深入了解了逻辑回归的数学原理，但是，这个网站打不开，这归功于中国伟大工程---金盾工程。这就是我要停更的原因。我也要学习原理知识，但没有看到，所以就只能看视频了。*/

#100DaysOfMLCode To clear my insights on logistic regression I was searching on the internet for some resource or article and I came across this article (https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc) by Saishruthi Swaminathan.

为了清楚我对逻辑回归的见解，我在互联网上搜索了一些资源或文章，我在Saishruthi Swaminathan看到了这篇文章(https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc)

It gives a detailed description of Logistic Regression. Do check it out.

它给出了逻辑回归的详细描述。一定要看一下。

第9天支持向量机

Got an intuition on what SVM is and how it is used to solve Classification problem.

直观了解SUM是什么以及如何使用它来解决分类问题。

/*还是一句话，所以自己学很重要。*/

第十天 SVM 和 KNN

Learned more about how SVM works and implementing the knn algorithm.

了解更多关于SVM如何工作和实现KNN算法。

/*你就说一句，我看多少合适呢？ */

第11 天实现KNN

/*用到的数据集还是昨天用的，这里就不在赘述了，想了解看前一篇，记住，一定要把数据集放到KNN实现代码的同级目录*/

代码

Importing the libraries 导入库

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset 导入数据集

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

Splitting the dataset into the Training set and Test set

数据集分割为训练集和测试集

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Feature Scaling 特征提取

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Fitting K-NN to the Training set 使KNN拟合训练集

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

Predicting the Test set results 预测测试集的结果

y_pred = classifier.predict(X_test)

Making the Confusion Matrix 构造混合矩阵

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

杂谈

还是老样子，按上面的代码没有显示结果，加一行

print(cm)

对输出结果有什么疑问可以参看上一篇，写了很久，到饭点了，饿的不要不要的，就不再罗嗦了，吃饭。