参考文献_参考文献:

参考文献 算法介绍 (Algorithm introduction)kNN (k nearest neighbors) is one of the simplest ML algorithms, often taught as one of the first algorithms during introductory courses. It’s relatively simple but q...
摘要由CSDN通过智能技术生成

参考文献

算法介绍 (Algorithm introduction)

kNN (k nearest neighbors) is one of the simplest ML algorithms, often taught as one of the first algorithms during introductory courses. It’s relatively simple but quite powerful, although rarely time is spent on understanding its computational complexity and practical issues. It can be used both for classification and regression with the same complexity, so for simplicity, we’ll consider the kNN classifier.

kNN(k个最邻近的邻居)是最简单的ML算法之一,通常在入门课程中被教为最早的算法之一。 它相对简单但功能强大,尽管很少花费时间来了解其计算复杂性和实际问题。 它可以用于具有相同复杂度的分类和回归,因此为简单起见,我们将考虑kNN分类器。

kNN is an associative algorithm — during prediction it searches for the nearest neighbors and takes their majority vote as the class predicted for the sample. Training phase may or may not exist at all, as in general, we have 2 possibilities:

kNN是一种关联算法-在预测过程中会搜索最近的邻居,并将其多数票作为为样本预测的类别。 培训阶段可能不存在,也可能根本不存在,因为一般来说,我们有两种可能性:

  1. Brute force method — calculate distance from new point to every point in training data matrix X, sort distances and take k nearest, then do a majority vote. There is no need for separate training, so we only consider prediction complexity.

    蛮力法—计算训练数据矩阵X中从新点到每个点的距离,对距离进行排序并取k最接近,然后进行多数表决。 不需要单独的培训,因此我们仅考虑预测的复杂性。
  2. Using data structure — organize the training points from X into the auxiliary data structure for faster nearest neighbors lookup. This approach uses additional space and time (for creating data structure during training phase) for faster predictions.

    使用数据结构-将X的训练点组织到辅助数据结构中,以更快地进行最近邻居查找。 这种方法使用了额外的空间和时间(用于在训练阶段创建数据结构),以加快预测速度。

We focus on the methods implemented in Scikit-learn, the most popular ML library for Python. It supports brute force, k-d tree and ball tree data structures. These are relatively simple, efficient and perfectly suited for the kNN algorithm. Construction of these trees stems from computational geometry, not from machine learning, and does not concern us that much, so I’ll cover it in less detail, more on the conceptual level. For more details on that, see links at the end of the article.

我们专注于Scikit-learn中实现的方法,Scikit-learn是最流行的Python ML库。 它支持暴力,kd树和球树数据结构。 这些相对简单,有效并且非常适合kNN算法。 这些树的构建源于计算几何,而不是来自机器学习,并且与我们的关系不大,因此我将在概念层面上更详细地介绍它。 有关这方面的更多详细信息,请参见本文结尾处的链接。

In all complexities below times of calculating the distance were omitted since they are in most cases negligible compared to the rest of the algorithm. Additionally, we mark:

在所有复杂情况下,省略了计算距离的时间,因为与算法的其余部分相比,它们在大多数情况下可以忽略不计。 此外,我们标记:

  • n: number of points in the training dataset

    n :训练数据集中的点数

  • d: data dimensionality

    d :数据维数

  • k: number of neighbors that we consider for voting

    k :我们考虑投票的邻居数

蛮力法 (Brute force method)

Training time complexity: O(1)

训练时间复杂度: O(1)

Training space complexity: O(1)

训练空间复杂度: O(1)

Prediction time complexity:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值