Datacamp 笔记&代码 Unsupervised Learning in Python 第一章 Clustering for dataset exploration

最新推荐文章于 2021-03-21 05:34:58 发布

JinnyR

最新推荐文章于 2021-03-21 05:34:58 发布

阅读量1.1k

点赞数

分类专栏： datacamp 文章标签： datacamp sklearn

本文链接：https://blog.csdn.net/u011292816/article/details/97866293

版权

在Datacamp的课程中，通过Python的sklearn库进行无监督学习的聚类分析。首先对2D点进行聚类，然后检查聚类效果，确定谷物数据的合适簇数并评估其聚类质量。接着对鱼类数据和股票数据进行标准化和聚类，以发现数据中的模式。最后，通过对公司每日股票价格变动的聚类分析，揭示哪些公司的股价走势具有相似性。

摘要由CSDN通过智能技术生成

更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python

Datacamp track: Data Scientist with Python - Course 23 (1)

Exercise

Clustering 2D points

From the scatter plot of the previous exercise, you saw that the points seem to separate into 3 clusters. You’ll now create a KMeans model to find 3 clusters, and fit it to the data points from the previous exercise. After the model has been fit, you’ll obtain the cluster labels for some new points using the .predict() method.

You are given the array points from the previous exercise, and also an array new_points.

Instruction

Import KMeans from sklearn.cluster.
Using KMeans(), create a KMeans instance called model to find 3 clusters. To specify the number of clusters, use the n_clusters keyword argument.
Use the .fit() method of model to fit the model to the array of points points.
Use the .predict() method of model to predict the cluster labels of new_points, assigning the result to labels.
Hit ‘Submit Answer’ to see the cluster labels of new_points.

import pandas as pd

df = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2072/datasets/3-point-clouds-in-2d.csv', header=None)
data = df.values
N = 300
points = data[:N,:]
new_points = data[N:,:]

# Import KMeans
from sklearn.cluster import KMeans

# Create a KMeans instance with 3 clusters: model
model = KMeans(n_clusters=3)

# Fit model to points
model.fit(points)

# Determine the cluster labels of new_points: labels
labels = model.predict(new_points)

# Print cluster labels of new_points
print(labels)

[0 2 1 0 2 0 2 2 2 1 0 2 2 1 1 2 1 1 2 2 1 2 0 2 0 1 2 1 1 0 0 2 2 2 1 0 2
 2 0 2 1 0 0 1 0 2 1 1 2 2 2 2 1 1 0 0 1 1 1 0 0 2 2 2 0 2 1 2 0 1 0 0 0 2
 0 1 1 0 2 1 0 1 0 2 1 2 1 0 2 2 2 0 2 2 0 1 1 1 1 0 2 0 1 1 0 0 2 0 1 1 0
 1 1 1 2 2 2 2 1 1 2 0 2 1 2 0 1 2 1 1 2 1 2 1 0 2 0 0 2 1 0 2 0 0 1 2 2 0
 1 0 1 2 0 1 1 0 1 2 2 1 2 1 1 2 2 0 2 2 1 0 1 0 0 2 0 2 2 0 0 1 0 0 0 1 2
 2 0 1 0 1 1 2 2 2 0 2 2 2 1 1 0 2 0 0 0 1 2 2 2 2 2 2 1 1 2 1 1 1 1 2 1 1
 2 2 0 1 0 0 1 0 1 0 1 2 2 1 2 2 2 1 0 0 1 2 2 1 2 1 1 2 1 1 0 1 0 0 0 2 1
 1 1 0 2 0 1 0 1 1 2 0 0 0 1 2 2 2 0 2 1 1 2 0 0 1 0 0 1 0 2 0 1 1 1 1 2 1
 1 2 2 0]

Exercise

Inspect your clustering

Let’s now inspect the clustering you performed in the previous exercise!

A solution to the previous exercise has already run, so new_points is an array of points and labels is the array of their cluster labels.

Instruction

Import matplotlib.pyplot as plt.
Assign column 0 of new_points to xs, and column 1 of new_points to ys.
Make a scatter plot of xs and ys, specifying the c=labels keyword arguments to color the points by their cluster label. Also specify alpha=0.5.
Compute the coordinates of the centroids using the .cluster_centers_ attribute of model.
Assign column 0 of centroids to centroids_x, and column 1 of centroids to centroids_y.
Make a scatter plot of centroids_x and centroids_y, using 'D' (a diamond) as a marker by specifying the marker parameter. Set the size of the markers to be 50using s=50.

# Import pyplot
import matplotlib.pyplot as plt

# Assign the columns of new_points: xs and ys
xs = new_points[:,0]
ys = new_points[:,1]

# Make a scatter plot of xs and ys, using labels to define the colors
plt.scatter(xs, ys, c=labels, alpha=0.5)

# Assign the cluster centers: centroids
centroids = model.cluster_centers_

# Assign the columns of centroids: centroids_x, centroids_y
centroids_x = centroids[:,0]
centroids_y

最低0.47元/天解锁文章

JinnyR

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Datacamp 笔记&代码 Unsupervised Learning in Python 第一章 Clustering for dataset exploration

更多原始数据文档和JupyterNotebookGithub: https://github.com/JinnyR/Datacamp_DataScienceTrack_PythonDatacamp track: Data Scientist with Python - Course 23 (1)ExerciseClustering 2D pointsFrom the scatter pl...
复制链接

扫一扫

专栏目录