k最近邻算法
从零开始的算法 (Algorithms From Scratch)
介绍 (Introduction)
A non-parametric algorithm capable of performing Classification and Regression; Thomas Cover, a professor at Stanford University, first proposed the idea of K-Nearest Neighbors algorithm in 1967.
一种能够执行分类和回归的非参数算法; 斯坦福大学教授Thomas Cover于1967年首次提出了K最近邻算法的思想。
Many often refer to the K-NN as a lazy learner or a type of instance based learner since all computation is deferred until function evaluation. Personally, I believe this puts K-Nearest Neighbors towards the less complex end of Machine Learning Algorithms when we begin to conceptualize it.
许多人通常将K-NN称为懒惰学习者或一种基于实例的学习者,因为所有计算都推迟到函数评估为止。 我个人认为,当我们开始对其进行概念化时,这会使K最近邻居进入机器学习算法不太复杂的一端。
No matter whether we are doing Classification or Regression style problems the input will consist of the k nearest training examples in the original feature space. However, the output for the algorithm will of-course depend on the type of question — See Terminology section for more on the different outputs.
无论我们正在处理分类还是回归样式问题,输入都将包含原始特征空间中的k个最近的训练示例。 但是,该算法的输出当然取决于问题的类型-有关不同输出的更多信息,请参见术语部分。
Link to code generated in the Article…
链接到文章中生成的代码…
术语 (Terminology)
K-Nearest Neighbors Classification → The output would determine class membership and the prediction is made by a plurality vote of its neighbors. Therefore, the new instance would be assigned to the class most common amongst the k nearest neighbors.
K最近邻居分类 →输出将确定类别成员资格,并且通过其邻居的多次投票做出预测。 因此,新实例将分配给k个最近邻居中最常见的类。
K-Nearest Neighbors Regression → The output would determine property value for the object. Therefore, the new instance would be classified as the average of the values of the k Nearest Neighbors
K最近邻回归 →输出将确定对象的属性值。 因此,新实例将被分类为k个最近邻居的值的平均值
Instance-Based Learning → A family of Machine Learning Learning Algorithms that, instead of performing explicit generalization, compares new problem instances with instances seen in training which has been stored in memory. (Source: Wikipedia)
基于实例的学习 →一系列机器学习算法,而不是执行明确的概括,而是将新问题实例与训练中看到的实例进行比较,并将其存储在内存中。 (来源: 维基百科 )
Lazy Learning → A Machine Learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager