数据分析——算法——K-means聚类（天池：汽车产品聚类分析）

最新推荐文章于 2024-07-24 14:31:04 发布

MENG-

最新推荐文章于 2024-07-24 14:31:04 发布

阅读量1.1w

点赞数 17

分类专栏：数据分析、挖掘文章标签：机器学习聚类 python 数据分析

本文链接：https://blog.csdn.net/qq_42320048/article/details/117019004

版权

K-means聚类

原理：通过计算不同样本间的距离来判断他们的相近关系的，相近的就会放到同一个类别中去。

适用数据：数值数据

优点：思想简单，容易实现，可解释度比较强

缺点：对噪音和异常点比较的敏感。k-means是在做凸优化，因此处理不了非凸的分布,对于条形或不规则形状的数据，效果较差。如果两个类别距离比较近，k-means的效果也不会太好。初始中心点的选择以及k值的选择对结果影响较大，可能每次聚类结果都不一样。结果可能只是局部最优而不是全局最优

数据集:天池学习赛数据集——汽车产品聚类分析（有兴趣的同学可以直接参赛学习一下）

任务：对该汽车数据进行聚类分析，并找到vokswagen汽车的相应竞品。

数据介绍：car_price.csv，数据包括了205款车的26个字段

1	Car_ID	Unique id of each observation (Interger)
2	Symboling	Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3	carCompany	Name of car company (Categorical)
4	fueltype	Car fuel type i.e gas or diesel (Categorical)
5	aspiration	Aspiration used in a car (Categorical)
6	doornumber	Number of doors in a car (Categorical)
7	carbody	body of car (Categorical)
8	drivewheel	type of drive wheel (Categorical)
9	enginelocation	Location of car engine (Categorical)
10	wheelbase	Weelbase of car (Numeric)

关注

专栏目录