从头开始vue创建项目_从头开始创建K均值聚类

从头开始vue创建项目

K-means clustering is by far the most common unsupervised machine learning algorithm. They are very useful in giving insight into grouping (clustering) data together to find common features.

到目前为止, K均值聚类是最常见的无监督机器学习算法。 它们对于深入了解将数据分组(聚类)以发现共同特征非常有用。

代码: (The Code:)

步骤1 | 生成集群: (Step 1| Generate Clusters:)

def generate_clusters():
cluster_values = [np.random.choice(data) for _ in range(k)]
return cluster_values

This will randomly pick k number of points from the data as the start of the clusters.

这将从数据中随机抽取k个点作为聚类的起点。

步骤2 | 分配集群: (Step 2| Assign Clusters:)

def assign_clusters(cluster_values):
clusters = []
for i in range(k):
clusters.append([])
for point in data:
minimum_value = np.inf
index = 0
for cluster in cluster_values:
distance = abs(cluster-point)
if distance < minimum_value:
minimum_value = distance
index = cluster_values.index(cluster)
clusters[index].append(point)
return clusters

This function assigns each data point on the number line a cluster by finding which cluster is nearest to it. If you are applying this for 2 or 3 dimensional space, the formula in the function would not work, use Euclidean distance instead.

此功能通过找到最接近的簇来将数字线上的每个数据点分配给一个簇。 如果将其应用于2或3维空间,则函数中的公式将不起作用,请改用欧几里德距离。

步骤3 | 计算平均值: (Step 3| Calculate Averages:)

def calculate_averages(clusters):
averages = []
for cluster in clusters:
averages.append(np.mean(cluster))
return averages

This function is important to find the center of each cluster. Re-assigning around this center will yield better results.

此功能对于找到每个群集的中心很重要。 在此中心附近重新分配将产生更好的结果。

步骤4 | 功能齐全: (Step 4| Full function:)

def k_means(k,data,iterations):
def generate_clusters():
cluster_values = [np.random.choice(data) for _ in range(k)]
return cluster_values

def assign_clusters(cluster_values):
clusters = []
for i in range(k):
clusters.append([])
for point in data:
minimum_value = np.inf
index = 0
for cluster in cluster_values:
distance = abs(cluster-point)
if distance < minimum_value:
minimum_value = distance
index = cluster_values.index(cluster)
clusters[index].append(point)
return clusters

def calculate_averages(clusters):
averages = []
for cluster in clusters:
averages.append(np.mean(cluster))
return averages

def calculate_variance(clusters,average):
variances = []
for cluster in clusters:
variance = 0
index = clusters.index(cluster)
for value in cluster:
variance += abs(average[index] - value)
variances.append(variance)
return sum(variances)

cluster_of_clusters = []
variances = []
for _ in range(iterations):
cluster_values = generate_clusters()
clusters = assign_clusters(cluster_values)
while True:
averages = calculate_averages(clusters)
clusters = assign_clusters(averages)
variance = calculate_variance(clusters,averages)
previous_clusters = clusters
if clusters == previous_clusters:
break
cluster_of_clusters.append(clusters)
variances.append(variance)
best_cluster = cluster_of_clusters[variances.index(min(variances))]
return min(variances)

Put all the functions from other steps into a “k_means” function, that accepts the k_value, iterations and the data as parameters. Iterate the previous steps, and then return the minimum variance of all iterations.

将其他步骤中的所有函数放入“ k_means”函数中,该函数接受k_value,迭代和数据作为参数。 迭代前面的步骤,然后返回所有迭代的最小方差。

步骤5 | 测试K值: (Step 5| Test for values of K:)

k_range = 10
X = [i for i in range(1,k_range)]
for i in range(1,k_range):
variance = k_means(i,np.random.randn(100),1000)
variances.append(variance)
kn = KneeLocator(X, variances,S=1.0, curve="convex", direction="decreasing")
kn.plot_knee()

The main way to optimize the values of K is to find the elbow point: The point before the rate of reduction slows to a stop. This value is the optimum value of K. The elbow algorithm itself is quite complicated (see paper here), so I just used the kneed library.

优化K值的主要方法是找到弯头点:降低速率之前的点减速到停止。 该值是K的最佳值。弯头算法本身非常复杂(请参见此处的文章 ),因此我只使用了跪式库

With those last few lines of code, I would like to say thank you for reading this article, and I hope you learnt something!

在最后几行代码中,我想感谢您阅读本文,希望您能学到一些!

翻译自: https://medium.com/hands-on-data-science/creating-k-means-clustering-from-scratch-57b9eb6fe20f

从头开始vue创建项目

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值