100 days of ML code----挑战100天搞定机器学习(7--11)

这是挑战100天搞定机器学习的第7天到第11天的翻译。这篇文章后可能会停更几天,因为有几天原作者给出的只是简单的一句话,我也想深入的了解理论知识。李宏毅教授的课程我也没有学到相应位置,泡沫积累太多,所以要沉淀一下了。我会很快回来的,2.0倍播放速度看视频不会错过什么,只能让你更专心。

原项目地址在这里,作者写的相当不错。

转载请注明出处。

第7天 K最临近算法

what is k-NN

什么是KNN

K-Nearest Neighbro algorithm is a simple yet most used classification algorithm.

KNN 算法是一种简单但最常用的分类算法

it can also be used for regression.

它也可以用作回归问题

KNN is non-parametric (means that it does not make any assumptions on the underlying data distribution),instance-based(means that our algorithm doesn’t explicitly learn a model. Instead, it chooses to memorize the training instances.)

and used in a supervised learning setting

KNN 没有参数(即它不对基础数据分布做出任何假设)、基于实例(即算法没有明确的学习模型,相反,它选择记忆训练实例)、用于监督学习环境。

we want to classify the grey point into one of the three classes light green, green and red.

我们想把灰色点分为浅绿色、绿色和红色三个类别中的一个。

start by calculating the distance between the grey point and K-nearest points

通过计算灰点与k最近点之间的距离开始

making predections

预测

to classify an unlabeled object, the distance of this object to the labeled objects is computed, its K-nearest neighors are identified, and the class label of the majority of nearset meighbors is then used to determine the class label of the object

为了对未标记对象进行分类,计算该对象与标记对象的距离,确定其k最近邻,然后使用大多数近邻邻居的类标签来确定对象的类标签。

for real-valued input variables, the most popular distance measure is Euclidean distance

对于实值输入变量,最常用的距离度量是Euclidean距离。

Value of k

k 的值

finding the value of not easy.

找到这个值不容易

a small value of k means than noise will have a higher influence on the result and a large value make it computationally expensive.

k值较小,噪声对结果的影响较大,而较大的k值使其计算成本较高。

It depend a lot on your individual cases, sometimes it is best to run through each possible value for k and decide for yourself.

它在很大程度上取决于你的,有时最好贯穿k的每个可能值并你自行决定。

K-NN is also called a lazy algorithm because it is instance based

K-NN也被称为懒人算法,因为它是基于实例的。

how does K-NN algorithm work

K-NN 的工作原理

k-NN when used used for classification—the output is a class membership (predicts a class –-- discrete value )

k-NN用于分类时 – 输出是类成员(预测一个类---离散值)

there are three key elements of this approach: a set of labeled objects, e.g, a set of stored records, and the value of k, the number of nearest neighbors

该方法有三个关键要素:一组标记对象,例如,一组存储的记录,

以及k的值,最近邻居的数量。

the distance

距离

Euclidean distance is calculated as the squared differences between a new

point and an existing point across all input attributes.

欧几里德距离计算为所有输入属性中新点与现有点之间的平方差。

Other popular distance measures include:

hamming distance

manhattan distance

munkowski distance

其他主流的距离计算包括:

汉明距离

曼哈顿距离

闵可夫斯基距离

第8 天 逻辑回归的数学原理

/*译者注:

今天作者只给出了一个网址,说是深入了解了逻辑回归的数学原理,但是,这个网站打不开,这归功于中国伟大工程---金盾工程。这就是我要停更的原因。我也要学习原理知识,但没有看到,所以就只能看视频了。*/

#100DaysOfMLCode To clear my insights on logistic regression I was searching on the internet for some resource or article and I came across this article (https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc) by Saishruthi Swaminathan.

为了清楚我对逻辑回归的见解,我在互联网上搜索了一些资源或文章,我在Saishruthi Swaminathan看到了这篇文章(https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc)

It gives a detailed description of Logistic Regression. Do check it out.

它给出了逻辑回归的详细描述。 一定要看一下。

第9天 支持向量机

Got an intuition on what SVM is and how it is used to solve Classification problem.

直观了解SUM是什么以及如何使用它来解决分类问题。

/*还是一句话,所以自己学很重要。*/

第十天 SVM 和 KNN

Learned more about how SVM works and implementing the knn algorithm.

了解更多关于SVM如何工作和实现KNN算法。

/*你就说一句,我看多少合适呢? */

第11 天 实现KNN

/*用到的数据集还是昨天用的,这里就不在赘述了,想了解看前一篇,记住,一定要把数据集放到KNN实现代码的同级目录*/

代码

Importing the libraries 导入库

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset 导入数据集

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

Splitting the dataset into the Training set and Test set

数据集分割为训练集和测试集

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Feature Scaling 特征提取

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Fitting K-NN to the Training set 使KNN拟合训练集

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

Predicting the Test set results 预测测试集的结果

y_pred = classifier.predict(X_test)

Making the Confusion Matrix 构造混合矩阵

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

杂谈

还是老样子,按上面的代码没有显示结果,加一行

print(cm)

对输出结果有什么疑问可以参看上一篇,写了很久,到饭点了,饿的不要不要的,就不再罗嗦了,吃饭。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值