KNN算法-查找最近的邻居

最新推荐文章于 2023-11-18 19:25:37 发布

cunzai1985

最新推荐文章于 2023-11-18 19:25:37 发布

阅读量1.5k

点赞数 1

文章标签：算法图像识别 python 机器学习深度学习

原文链接：https://www.tutorialspoint.com/machine_learning_with_python/knn_algorithm_finding_nearest_neighbors.htm

版权

KNN（K-Nearest Neighbors）是一种监督学习算法，常用于分类任务。算法基于特征相似性，新数据点会被分配到与其最接近的训练数据点类别。它分为懒学习和非参数学习算法，无需专门训练阶段。KNN的工作流程包括选择K值，计算测试数据与训练数据间的距离，然后基于最近K个邻居进行分类。Python中，KNN可用于分类和回归，但存在计算成本高、需要大量存储和对数据尺度敏感等缺点。KNN应用广泛，如银行系统预测贷款批准，计算信用评级和政治选举预测。

摘要由CSDN通过智能技术生成

KNN算法-查找最近的邻居 (KNN Algorithm - Finding Nearest Neighbors)

介绍 (Introduction)

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry. The following two properties would define KNN well −

K最近邻(KNN)算法是一种监督的ML算法，可用于分类以及回归预测问题。但是，它主要用于行业中的分类预测问题。以下两个属性将很好地定义KNN-

Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
惰性学习算法 -KNN是一种惰性学习算法，因为它没有专门的训练阶段，并且在分类时将所有数据用于训练。
Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.
非参数学习算法 -KNN也是非参数学习算法，因为它不假设有关基础数据的任何信息。

KNN算法的工作 (Working of KNN Algorithm)

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set. We can understand its working with the help of following steps −

K最近邻(KNN)算法使用“特征相似性”来预测新数据点的值，这进一步意味着，将根据新数据点与训练集中的点的匹配程度为该新数据点分配一个值。我们可以通过以下步骤了解其工作方式-

Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN, we must load the training as well as test data.
步骤1-为了实现任何算法，我们需要数据集。因此，在KNN的第一步中，我们必须加载训练以及测试数据。
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any integer.
步骤2-接下来，我们需要选择K的值，即最近的数据点。 K可以是任何整数。
Step 3 − For each point in the test data do the following −

步骤3-对于测试数据中的每个点，请执行以下操作-

3.1 − Calculate the distance between test data and each row of training data with the help of any of the method namely: Euclidean, Manhattan or Hamming distance. The most commonly used method to calculate distance is Euclidean.

3.1-借助以下任意一种方法来计算测试数据与训练数据的每一行之间的距离：欧几里得距离，曼哈顿距离或汉明距离。最常用的距离计算方法是欧几里得。

3.2 − Now, based on the distance value, sort them in ascending order.

3.2-现在，基于距离值，按升序对它们进行排序。

3.3 − Next, it will choose the top K rows from the sorted array.

3.3-接下来，它将从排序