Flink机器学习- KMeans算法实现

最新推荐文章于 2024-04-27 08:38:16 发布

又到花开时节

最新推荐文章于 2024-04-27 08:38:16 发布

阅读量534

点赞数

文章标签：机器学习 flink

本文链接：https://blog.csdn.net/jzy1210/article/details/109671629

版权

本文详细介绍了如何在Flink中实现KMeans聚类算法，包括二维坐标点和聚类中心的POJO定义，以及数据准备和KMeans算法的具体实现。通过示例数据展示了如何进行点到聚类中心的分配和计算新的聚类中心，最后执行迭代并输出结果。

摘要由CSDN通过智能技术生成

1、K均值聚类算法定义（百度）

k均值聚类算法（k-means clustering algorithm）是一种迭代求解的聚类分析算法，其步骤是随机选取K个对象作为初始的聚类中心，
然后计算每个对象与各个种子聚类中心之间的距离，把每个对象分配给距离它最近的聚类中心。
聚类中心以及分配给它们的对象就代表一个聚类。每分配一个样本，聚类的聚类中心会根据聚类中现有的对象被重新计算。
这个过程将不断重复直到满足某个终止条件。
终止条件可以是没有（或最小数目）对象被重新分配给不同的聚类，没有（或最小数目）聚类中心再发生变化，误差平方和局部最小。

2、基本POJO定义

2.1 二维坐标点POJO

public class Point {
public double x, y;

public Point() {}

public Point(double x, double y) {
this.x = x;
this.y = y;
}

public Point add(Point other) {
x += other.x;
y += other.y;
return this;
}

//取均值使用
public Point div(long val) {
x /= val;
y /= val;
return this;
}

//欧几里得距离
public double euclideanDistance(Point other) {
return Math.sqrt((x - other.x) * (x - other.x) + (y - other.y) * (y - other.y));
}

public void clear() {
x = y = 0.0;
}

@Override
public String toString() {
return x + " " + y;
}
}

2.1 二维聚类中心POJO

public class Centroid extends Point{
public int id;

public Centroid() {}

public Centroid(int id, double x, double y) {
super(x, y);
this.id = id;
}

public Centroid(int id, Point p) {
super(p.x, p.y);
this.id = id;
}

@Override
public String toString() {
return id + " " + super.toString();
}
}

3、缺省的数据准备

public class KMeansData {
// We have the data as object arrays so that we can also generate Scala Data Sources from it.
public static final Object[][] CENTROIDS = new Object[][] {
new Object[] {1, -31.85, -44.77},
new Object[]{2, 35.16, 17.46},
new Object[]{3, -5.16, 21.93},
new Object[]{4, -24.06, 6.81}
};

public static final Object[][] POINTS = new Object[][] {