从头开始vue创建项目_从头开始创建K均值聚类

最新推荐文章于 2022-06-21 14:53:42 发布

weixin_26726011

最新推荐文章于 2022-06-21 14:53:42 发布

阅读量228

点赞数

文章标签： vue js ViewUI

原文链接：https://medium.com/hands-on-data-science/creating-k-means-clustering-from-scratch-57b9eb6fe20f

版权

从头开始vue创建项目

K-means clustering is by far the most common unsupervised machine learning algorithm. They are very useful in giving insight into grouping (clustering) data together to find common features.

到目前为止， K均值聚类是最常见的无监督机器学习算法。它们对于深入了解将数据分组(聚类)以发现共同特征非常有用。

代码： (The Code:)

步骤1 | 生成集群： (Step 1| Generate Clusters:)

def generate_clusters():
        cluster_values = [np.random.choice(data) for _ in range(k)]
        return cluster_values

This will randomly pick k number of points from the data as the start of the clusters.

这将从数据中随机抽取k个点作为聚类的起点。

步骤2 | 分配集群： (Step 2| Assign Clusters:)

def assign_clusters(cluster_values):
        clusters = []
        for i in range(k):
            clusters.append([])
        for point in data:
            minimum_value = np.inf
            index = 0
            for cluster in cluster_values:
                distance = abs(cluster-point)
                if distance < minimum_value:
                    minimum_value = distance
                    index = cluster_values.index(cluster)
            clusters[index].append(point)
        return clusters

This function assigns each data point on the number line a cluster by finding which cluster is nearest to it. If you are applying this for 2 or 3 dimensional space, the formula in the function would not work, use Euclidean distance instead.

此功能通过找到最接近的簇来将数字线上的每个数据点分配给一个簇。如果将其应用于2或3维空间，则函数中的公式将不起作用，请改用欧几里德距离。

步骤3 | 计算平均值： (Step 3| Calculate Averages:)

def calculate_averages(clusters):
        averages = []
        for cluster in clusters:
            averages.append(np.mean(cluster))
        return averages

This function is important to find the center of each cluster. Re-assigning around this center will yield better results.

此功能对于找到每个群集的中心很重要。在此中心附近重新分配将产生更好的结果。

步骤4 | 功能齐全： (Step 4| Full function:)

def k_means(k,data,iterations):
    def generate_clusters():
        cluster_values = [np.random.choice(data) for _ in range(k)]
        return cluster_values
    
    def assign_clusters(cluster_values):
        clusters = []
        for i in range(k):
            clusters.append([])
        for point in data:
            minimum_value = np.inf
            index = 0
            for cluster in cluster_values:
                distance = abs(cluster-point)
                if distance < minimum_value:
                    minimum_value = distance
                    index = cluster_values.index(cluster)
            clusters[index].append(point)
        return clusters
    
    def calculate_averages(clusters):
        averages = []
        for cluster in clusters:
            averages.append(np.mean(cluster))
        return averages
    
    def calculate_variance(clusters,average):
        variances = []
        for cluster in clusters:
            variance = 0
            index = clusters.index(cluster)
            for value in cluster:
                variance += abs(average[index] - value)
            variances.append(variance)
        return sum(variances)
    
    cluster_of_clusters = []
    variances = []
    for _ in range(iterations):    
        cluster_values = generate_clusters()
        clusters = assign_clusters(cluster_values)
        while True:
            averages = calculate_averages(clusters)
            clusters = assign_clusters(averages)
            variance = calculate_variance(clusters,averages)
            previous_clusters = clusters
            if clusters == previous_clusters:
                break
        cluster_of_clusters.append(clusters)
        variances.append(variance)
    best_cluster = cluster_of_clusters[variances.index(min(variances))]
    return min(variances)

Put all the functions from other steps into a “k_means” function, that accepts the k_value, iterations and the data as parameters. Iterate the previous steps, and then return the minimum variance of all iterations.

将其他步骤中的所有函数放入“ k_means”函数中，该函数接受k_value，迭代和数据作为参数。迭代前面的步骤，然后返回所有迭代的最小方差。

步骤5 | 测试K值： (Step 5| Test for values of K:)

k_range = 10
X = [i for i in range(1,k_range)]
for i in range(1,k_range):
    variance = k_means(i,np.random.randn(100),1000)
    variances.append(variance)
kn = KneeLocator(X, variances,S=1.0, curve="convex", direction="decreasing")
kn.plot_knee()

The main way to optimize the values of K is to find the elbow point: The point before the rate of reduction slows to a stop. This value is the optimum value of K. The elbow algorithm itself is quite complicated (see paper here), so I just used the kneed library.

优化K值的主要方法是找到弯头点：降低速率之前的点减速到停止。该值是K的最佳值。弯头算法本身非常复杂(请参见此处的文章 )，因此我只使用了跪式库。

With those last few lines of code, I would like to say thank you for reading this article, and I hope you learnt something!

在最后几行代码中，我想感谢您阅读本文，希望您能学到一些！

翻译自: https://medium.com/hands-on-data-science/creating-k-means-clustering-from-scratch-57b9eb6fe20f

从头开始vue创建项目

weixin_26726011

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
从头开始vue创建项目_从头开始创建K均值聚类

从头开始vue创建项目K-means clustering is by far the most common unsupervised machine learning algorithm. They are very useful in giving insight into grouping (clustering) data together to find common features...
复制链接

扫一扫