K-Nearest Neighbor

本文详细介绍了K-最近邻(KNN)算法,包括其基本概念、工作原理、数学背景以及代码实现。KNN是一种基本的分类算法,常被用作基准与复杂算法比较。它在经济预测、数据压缩等领域有广泛应用。KNN通过计算新数据点与训练集中点的距离,选取最近的K个点,并根据这些点的多数类别进行预测。
摘要由CSDN通过智能技术生成

Hello readers, this is an in-depth discusssion about a powerful classification algorithm called K-Nearest Neighbor(KNN). I have tried my best for collecting the information so that you can understand easily. So let’s begin…

The main contents are:

  • Inroduction.
  • What is KNN…?
  • How does KNN works…?
  • The Mathematics behind KNN.
  • KNN code implementation.

Introduction

The KNN algorithm is one of the most fundamental, robust and versatile classifier that is often used as a benchmark for more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics. For example, KNN was leveraged in a 2006 study of functional genomics for the assignment of genes based on their expression profiles.

What is KNN?

Let’s start with a simple example, in the picture bellow you can see, we have a set of 2 types of animals (Horse and Dog). If you want to know about a new data-point(animal)weather it is Horse or Dog the KNN algorithm will be able to tell you based on its features (Height and Weight)

pic:1
Beside that you can also think like this way, we will use x to denote a feature (aka. predictor, attribute) and y to denote the target (aka. label, class) we are trying to predict.
KNN falls in the supervised learning family of algorithms. Informally, this means we are given a labelled dataset consisting of training observations (x, y) and would like to capture the relationship between x and y. More formally, our goal is to learn a function h:X→Y, so that given an unseen observation x, h(x) can confidently predict the corresponding output y.

Some notation and defination:

The KNN classifier is also a supervised, non parametric, instance-based or lazy learning algorithm so the key notations are:

  • Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.
  • Non-parametric means it makes no explicit assumptions about the functional form of h, avoiding the dangers of mismodeling the underlying distribution of the data. For example, suppose our data is highly non-Gaussian but the learning model we choose assumes a Gaussian form. In that case, our algorithm would make extremely poor predictions.
  • Instance-based learning means that our algorithm doesn’t explicitly learn a model. Instead, it chooses to memorize the training instances which are subsequently used as “knowledge” for the prediction phase. Concretely, this means that only when a query to our database is made (i.e. when we ask it to predict a label given an input), will the algorithm use the training instances to spit out an answer.
  • Lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data before receiving queries.

How does KNN work?

The principle behind K-Nearest Neighbor is to calculate the distance between a data-point X and all the point in the data and predict the majority label of the k closest points.
In the picture below, the red star is an animal. If we take the 3 closest points (k=3), our animal is more likely to be a horse (the probability of being a horse is 2/3). But with k=6 the new animal is more likely to be a dog (with a probability of 4/6).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值