KNN:Strengths, weaknesses, and parameters

最新推荐文章于 2021-03-01 00:10:18 发布

赛博行者

最新推荐文章于 2021-03-01 00:10:18 发布

阅读量885

点赞数

分类专栏：机器学习

机器学习专栏收录该内容

13 篇文章

订阅专栏

K-最近邻算法简单易懂，在很多情况下无需过多调整即可获得合理的性能表现。本书介绍该算法的基本参数，包括邻居数量及距离度量方式，并讨论了其优缺点：优点在于模型直观且训练速度快；缺点则表现为预测阶段可能较慢，尤其是在处理高维特征或多特征数据集时效果不佳。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Strengths, weaknesses, and parameters
In principle, there are two important parameters to the KNeighbors classifier: the
number of neighbors and how you measure distance between data points. In practice,
using a small number of neighbors like three or five often works well, but you should
certainly adjust this parameter. Choosing the right distance measure is somewhat
beyond the scope of this book. By default, Euclidean distance is used, which works
well in many settings.
One of the strengths of k-NN is that the model is very easy to understand, and often
gives reasonable performance without a lot of adjustments. Using this algorithm is a
good baseline method to try before considering more advanced techniques. Building
the nearest neighbors model is usually very fast, but when your training set is very
large (either in number of features or in number of samples) prediction can be slow.
When using the k-NN algorithm, it’s important to preprocess your data (see Chap‐
ter 3). This approach often does not perform well on datasets with many features
(hundreds or more), and it does particularly badly with datasets where most features
are 0 most of the time (so-called sparse datasets).
So, while the nearest k-neighbors algorithm is easy to understand, it is not often used
in practice, due to prediction being slow and its inability to handle many features.
The method we discuss next has neither of these drawbacks.