机器学习聚类测试数据生成
使用 s k l e a r n . d a t a s e t s . m a k e b l o b s sklearn.datasets.makeblobs sklearn.datasets.makeblobs
使用方法
from sklearn.datasets import makeblobs
makeblobs(n_samples=100,
n_features=2,
centers=None,
cluster_std=1.0,
center_box=(-10.0, 10.0),
shuffle=True,
random_state=None
)
参数解释
n_samples:数据总大小
n_features:特征数量(数据维度)
centers:聚类中心数量
cluster_std:聚类的标准差
center_box:The bounding box for each cluster center when centers are generated at random.
shuffle:(bool)Shuffle the samples.
random_state:随机种子
返回两个值x,y,分别是属性和标签
from sklearn.datasets import make_blobs
x, y = make_blobs(n_samples=10,
n_features=3,
centers=3,
cluster_std=1.0,
center_box=(0, 10),
random_state=None)
print(x, y)
'''
x:
[[ 1.84867966 2.50972065 8.81261611]
[ 4.41642014 5.25986382 8.6340203 ]
[ 3.03900151 2.78507042 3.41420868]
[ 1.06376153 4.22600614 1.25715659]
[ 2.62402063 3.47719094 9.10799865]
[ 2.65567624 3.31822041 3.76622607]
[ 5.58539182 6.07267481 8.83147447]
[ 5.63875173 5.57249052 10.6567555 ]
[ 5.42795437 4.75499218 8.91883699]
[ 2.78790159 4.85890851 9.20342921]]
y:
[2 0 1 1 2 1 0 0 0 2]
'''