knn最近邻算法_k最近邻knn算法

knn最近邻算法

Data science is hot and it seems that everybody is working on some sort of project involving the latest state-of-the-art (SOTA) algorithm. Of course with good reason as in many cases we can use data to give very reasonable prediction, almost in any field. While there is a lot of focus lately on SOTA algorithms, the simpler methods are sometimes forgotten.

数据科学炙手可热,似乎每个人都在从事某种涉及最新技术(SOTA)算法的项目。 当然有充分的理由,因为在许多情况下,我们几乎可以在任何领域使用数据进行非常合理的预测。 尽管近来有很多关于SOTA算法的关注,但有时却忘记了更简单的方法。

Want to get started with Python? Start here!

想开始使用Python吗? 从这里开始

Recently, I played around with a k-nearest-neighbor (KNN) algorithm and I was amazed how powerful it can be. The technique itself is used in many other fields. For example, I used it to identify the same particles in consecutive frames of a high-speed recording for one of my research projects during my Ph.D. The coordinates of a particle are known and we look in the next frame at the closest particles around that position. Of course, when there are multiple particles very close by, you can get into trouble. For that, you can make use of higher order information from multiple frames such as the velocity or acceleration vector. For KNN in machine learning, we generally do not have temporal data therefore, we only use its first order, which is the simplest form.

最近,我尝试了一种k最近邻(KNN)算法,并且惊讶于它的强大功能。 该技术本身已在许多其他领域中使用。 例如,在我攻读博士学位期间,我用它来识别高速记录连续帧中的相同粒子。 粒子的坐标是已知的,我们在下一帧中查看该位置附近最接近的粒子。 当然,当附近有多个粒子时,您可能会遇到麻烦。 为此,您可以利用多个帧中的高阶信息,例如速度或加速度矢量。 对于机器学习中的KNN,我们通常没有时间数据,因此,我们仅使用其一阶形式,这是最简单的形式。

When we want to use KNN to classify new data, i.e. make a prediction, we use the already known data (or labeled data) as a kind of look-up table. We select data that is similar to the new data we want to predict and select the most prominent class from that selection. So we compare an unknown example to an already known dataset. There is no training, no layers, no weights. The only parameter is k and specifies the amount of neighbors to take into consideration when predicting the class. For example, to classify a kind of fruit, we can select the five most similar examples from the dataset. Now we say that the most prominent class of those five selected examples is probably also the class we want to predict. If we have found three apples and two pears, we would predict an apple as well.

当我们要使用KNN对新数据进行分类(即进行预测)时,我们将已经知道的数据(或标记的数据)用作一种查找表。 我们选择与我们要预测的新数据相似的数据,然后从该选择中选择最突出的类别。 因此,我们将未知示例与已知数据集进行比较。 没有训练,没有层次,没有重量。 唯一的参数是k ,它指定在预测类别时要考虑的邻居数量。 例如,要对一种水果进行分类,我们可以从数据集中选择五个最相似的示例。 现在我们说,这五个选定示例中最突出的类别可能也是我们要预测的类别。 如果我们发现了三个苹果和两个梨,那么我们也会预测一个苹果。

Now we come to another problem: how do we select the most similar examples from a list of features. When we have a single feature, e.g. height this would be very easy. We simply calculate the difference and select the k closest matches. But what to do when we also have weight and width? We have to quantify the difference for each feature and aggregate the result to a single value. Fortunately, there are many ways to do this. One of the most common is the Euclidean distance, which can be seen as the shortest straight line between two points.

现在,我们遇到另一个问题:如何从功能列表中选择最相似的示例。 当我们只有一个特征(例如高度)时,这将非常容易。 我们只计算差值并选择k个最接近的匹配项。 但是当我们也有体重宽度时该怎么办? 我们必须量化每个功能的差异并将结果汇​​总为一个值。 幸运的是,有很多方法可以做到这一点。 欧几里德距离是最常见的距离之一,可以看作是两点之间的最短直线。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值