机器学习算法基础 Day3

小浩码出未来！

已于 2022-12-19 21:52:41 修改

阅读量103

点赞数

文章标签：算法 python

于 2022-12-13 22:32:15 首次发布

本文链接：https://blog.csdn.net/weixin_43902376/article/details/128308925

版权

在这里插入图片描述

定义
如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。

KNN算法最早是由Cover和Hart提出的一种分类算法

计算距离公式
两个样本的距离可以通过如下公式计算，又叫欧式距离

比如说，a(a1,a2,a3),b(b1,b2,b3)

计算距离公式

两个样本的距离可以通过如下公式计算，又叫欧式距离

比如说，a(a1,a2,a3),b(b1,b2,b3)

在这里插入图片描述

K近邻需要做标准化处理

sklearn k-近邻算法API

sklearn.neighbors.KNeighborsClassifier(n_neighbors=5,algorithm=‘auto’)

n_neighbors：int,可选（默认= 5），k_neighbors查询默认使用的邻居数
algorithm：‘auto’，‘ball_tree’，‘kd_tree’，‘brute’}，可选用于计算最近邻居的算法：
‘ball_tree’将会使用 BallTree，
‘kd_tree’将使用 KDTree。
‘auto’将尝试根据传递给fit方法的值来决定最合适的算法。
(不同实现方式影响效率)

在这里插入图片描述

The goal of this competition is to predict which place a person would like to check in to.
本次比赛的目的是预测一个人想要登记的地方。
For the purposes of this competition, Facebook created an artificial world consisting of more than 100,000 places located in a 10 km by 10 km square.
为了本次比赛的目的，Facebook创建了一个人工世界，其中包括10多公里10平方公里的100,000多个地方。
For a given set of coordinates, your task is to return a ranked list of the most likely places.
对于给定的坐标集，您的任务是返回最可能位置的排名列表。
Data was fabricated to resemble location signals coming from mobile devices, giving you a flavor of what it takes to work with real data complicated by inaccurate and noisy values.
数据被制作成类似于来自移动设备的位置信号，让您了解如何处理由不准确和嘈杂的值导致的实际数据。
Inconsistent and erroneous location data can disrupt experience for services like Facebook Check In.
不一致和错误的位置数据可能会破坏Facebook Check In等服务的体验。
We highly encourage competitors to be active on Kaggle Scripts.
我们强烈鼓励竞争对手积极参与Kaggle Scripts。
Your work there will be thoughtfully included in the decision making process.
您在那里的工作将被认真地包含在决策过程中。
Please note: You must compete as an individual in recruiting competitions.
请注意：您必须在招募比赛中作为个人参加比赛。
You may only use the data provided to make your predictions.
您只能使用提供的数据进行预测。

在这里插入图片描述

数据
In this competition, you are going to predict which business a user is checking into based on their location, accuracy, and timestamp.
在本次竞赛中，您将根据用户的位置，准确性和时间戳预测用户正在检查的业务。
The train and test dataset are split based on time, and the public/private leaderboard in the test data are split randomly.
训练和测试数据集根据时间进行划分，测试数据中的公共/私人排行榜随机拆分。<

最低0.47元/天解锁文章

小浩码出未来！

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习算法基础 Day3

如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。KNN算法最早是由Cover和Hart提出的一种分类算法计算距离公式两个样本的距离可以通过如下公式计算，又叫欧式距离比如说，a(a1,a2,a3),b(b1,b2,b3)
复制链接

扫一扫