knn 机器学习
Introduction
介绍
For this article, I’d like to introduce you to KNN with a practical example.
对于本文,我想通过一个实际的例子向您介绍KNN。
I will consider one of my project that you can find in my GitHub profile. For this project, I used a dataset from Kaggle.
我将考虑可以在我的GitHub个人资料中找到的我的项目之一。 对于这个项目,我使用了Kaggle的数据集。
The dataset is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars organized in three classes. The analysis was done by considering the quantities of 13 constituents found in each of the three types of wines.
该数据集是对意大利同一地区种植的葡萄酒进行化学分析的结果,这些葡萄酒来自三个不同类别的三个品种。 通过考虑三种葡萄酒中每种葡萄酒中13种成分的数量来进行分析。
This article will be structured in three-part. In the first part, I will make a theoretical description of KNN, then I will focus on the part about exploratory data analysis in order to show you the insights that I found and at the end, I will show you the code that I used to prepare and evaluate the machine learning model.
本文将分为三部分。 在第一部分中,我将对KNN进行理论上的描述,然后,我将重点介绍探索性数据分析这一部分,以便向您展示我发现的见解,最后,我将向您展示我曾经使用过的代码准备和评估机器学习模型。
Part I: What is KNN and how it works mathematically?
第一部分:什么是KNN及其在数学上的作用?
The k-nearest neighbour algorithm is not a complex algorithm. The approach of KNN to predict and classify data consists of looking through the training data and finds the k training points that are closest to the new point. Then it assigns to the new data the class label of the nearest training data.
k最近邻居算法不是复杂的算法。 KNN预测和分类数据的方法包括浏览训练数据并找到最接近新点的k个训练点。 然后,它将新的训练数据的类别标签分配给新数据。
But how KNN works? To answer this question we have to refer to the formula of the euclidian distance between two points. Suppose you have to compute the distance between two points A(5,7) and B(1,4) in a Cartesian plane. The formula that you will apply is very simple:
但是KNN是如何工作的? 要回答这个问题,我们必须参考两点之间的欧几里得距离的公式。 假设您必须计算笛卡尔平面中两个点A(5,7)和B(1,4)之间的距离。 您将应用的公式非常