人工数据集

最新推荐文章于 2024-02-22 00:28:57 发布

「已注销」

最新推荐文章于 2024-02-22 00:28:57 发布

阅读量827

点赞数 1

文章标签： python matplotlib 机器学习

本文链接：https://blog.csdn.net/qq_44425179/article/details/130339749

版权

一：月亮数据集-make_moons

sklearn.datasets.make_moons(n_samples=100, shuffle=True, noise=None, random_state=None)

重要参数：n_samples：设置样本数量、noise:设置噪声、random_state：设置随机参数（嘿嘿，无所谓，随便设），我们主要讲参数noise

#导入必要库
from sklearn.datasets import  make_moons
import numpy as np
from numpy import where
import matplotlib.pyplot as plt

X, yy = make_moons(n_samples=200, random_state=123,noise=0.02) #噪声为0.02
#绘图
# 为每个类的样本创建散点图
for class_value in range(2):
# 获取此类的示例的行索引
    row_ix = where(yy == class_value)
# 创建这些样本的散布
    plt.scatter(X[row_ix, 0], X[row_ix, 1])  #什么意思？
# 绘制散点图
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-stFFM68b-1682308408909)(output_1_0.png)]

1:比较不同噪声的影响

# 月亮数据集
plt.figure(figsize=(7,7))
plt.subplot(211)
X, y = make_moons(n_samples=200, random_state=123,noise=0.02)
plt.title("noise=0.02")
plt.scatter(X[:, 0], X[:, 1], c=y)  
plt.subplot(212)
X1, y1 = make_moons(n_samples=200, random_state=123,noise=0.1)
plt.title("noise=0.1")
plt.scatter(X1[:, 0], X1[:, 1], c=y1)

<matplotlib.collections.PathCollection at 0x23da1626670>

在这里插入图片描述

二：圆环数据–make_circles()

sklearn.datasets.make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, factor=0.8)

重要参数：n_samples：设置样本数量、noise:设置噪声、factor：0 < double < 1 默认值0.8，内外圆之间的比例因子、random_state：设置随机参数（嘿嘿，无所谓，随便设），我们主要讲参数noise、factor

#导入必要库
from sklearn.datasets import  make_circles
import numpy as np
from numpy import where
import matplotlib.pyplot as plt

X, yy = make_circles(n_samples=200, random_state=123,noise=0.05,factor=0.5) #噪声为0.05,内外圆之间的比例因子0.5
#绘图
# 为每个类的样本创建散点图
for class_value in range(2):
# 获取此类的示例的行索引
    row_ix = where(yy == class_value)
# 创建这些样本的散布
    plt.scatter(X[row_ix, 0], X[row_ix, 1])  #什么意思？
# 绘制散点图
plt.show()

在这里插入图片描述

1:比较不同噪声

plt.figure(figsize=(7,7))
plt.subplot(211)
X, y = make_circles(n_samples=200, random_state=123,noise=0.05,factor=0.5)
plt.title("noise=0.05")
plt.scatter(X[:, 0], X[:, 1], c=y)  
#plt.scatter(X[:, 0], X[:, 1],marker='.', c=y)  
plt.subplot(212)
X1, y1 = make_circles(n_samples=200, random_state=123,noise=0.1,factor=0.5)
plt.title("noise=0.1")
plt.scatter(X1[:, 0], X1[:, 1], c=y1)

<matplotlib.collections.PathCollection at 0x23da156e850>

在这里插入图片描述

2:比较不同内外圆半径比

plt.figure(figsize=(7,7))
plt.subplot(211)
X, y = make_circles(n_samples=200, random_state=123,noise=0.05,factor=0.5)
plt.title("factor=0.5")
plt.scatter(X[:, 0], X[:, 1],marker='.', c=y)  
plt.subplot(212)
X1, y1 = make_circles(n_samples=200, random_state=123,noise=0.05,factor=0.8)
plt.title("factor=0.8")
plt.scatter(X1[:, 0], X1[:, 1],marker='.', c=y1)

<matplotlib.collections.PathCollection at 0x23d9a0d44f0>

在这里插入图片描述

三：make_classification()

sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2,
n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None,
flip_y=0.01, class_sep=1.0, hypercube=True,shift=0.0, scale=1.0,
shuffle=True, random_state=None)

功能：生成样本集，通常用于分类算法

参数：

n_features :特征个数= n_informative（） + n_redundant + n_repeated
n_informative：多信息特征的个数
n_redundant：冗余信息，informative特征的随机线性组合
n_repeated ：重复信息，随机提取n_informative和n_redundant 特征
n_classes：分类类别
n_clusters_per_class ：某一个类别是由几个cluster构成的

在这里插入图片描述

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1kHpgpQ1-1682308408913)(attachment:image.png)]

from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
#from sklearn.cluster import Birch  # 放后面，每个聚类前面
from matplotlib import pyplot

X, y = make_classification(n_samples=200, n_features=2, n_informative=1, 
                           n_redundant=0, n_clusters_per_class=1, random_state=40)
# 为每个类的样本创建散点图
for class_value in range(2):
# 获取此类的示例的行索引
    row_ix = where(y == class_value)
# 创建这些样本的散布
    pyplot.scatter(X[row_ix, 0], X[row_ix, 1])  #什么意思？
# 绘制散点图
pyplot.show()

在这里插入图片描述

plt.figure(figsize=(7,7))
plt.subplot(211)
X, y = make_classification(n_samples=200, n_features=2, n_informative=1, 
                           n_redundant=0, n_clusters_per_class=1, random_state=40)
plt.title("n_features=2, n_informative=1")
plt.scatter(X[:, 0], X[:, 1],c=y)  
plt.subplot(212)
X1, y1 = make_classification(n_samples=200, n_features=2, n_informative=2, 
                           n_redundant=0, n_clusters_per_class=1, random_state=40)
plt.title("n_features=2, n_informative=2")
plt.scatter(X1[:, 0], X1[:, 1], c=y1)

<matplotlib.collections.PathCollection at 0x23da13dcc40>

在这里插入图片描述

plt.figure(figsize=(7,7))
plt.subplot(211)
X, y = make_classification(n_samples=400, n_features=2, n_informative=1, 
                           n_redundant=0, n_clusters_per_class=1, random_state=40)
plt.title("n_samples=400, n_features=2, n_informative=1")
plt.scatter(X[:, 0], X[:, 1],c=y)  
plt.subplot(212)
X1, y1 = make_classification(n_samples=400, n_features=2, n_informative=2, 
                           n_redundant=0, n_clusters_per_class=1, random_state=40)
plt.title("n_samples=400, n_features=2, n_informative=2")
plt.scatter(X1[:, 0], X1[:, 1], c=y1)

<matplotlib.collections.PathCollection at 0x23da14a4e50>

在这里插入图片描述

四：make_blobs数据集

sklearn.datasets.make_blobs(n_samples=100,n_features=2,centers=3, cluster_std=1.0,center_box=(-10.0,10.0),shuffle=True,random_state=None)

make_blobs函数是为聚类产生数据集，产生一个数据集和相应的标签

n_samples:表示数据样本点个数,默认值100

n_features:是每个样本的特征（或属性）数，也表示数据的维度，默认值是2

centers:表示类别数（标签的种类数），默认值3

cluster_std表示每个类别的方差，例如我们希望生成2类数据，其中一类比另一类具有更大的方差，可以将cluster_std设置为[1.0,3.0]，浮点数或者浮点数序列，默认值1.0

center_box：中心确定之后的数据边界，默认值(-10.0, 10.0)

shuffle ：将数据进行洗乱，默认值是True

random_state:官网解释是随机生成器的种子，可以固定生成的数据，给定数之后，每次生成的数据集就是固定的。若不给定值，则由于随机性将导致每次运行程序所获得的的结果可能有所不同。在使用数据生成器练习机器学习算法练习或python练习时建议给定数值。

from numpy import unique
from numpy import where
from sklearn.datasets import make_blobs
#from sklearn.cluster import Birch  # 放后面，每个聚类前面
from matplotlib import pyplot

X, y = make_blobs(n_samples=400,n_features=3,centers=3, cluster_std=1.0,
                  center_box=(-10.0,10.0),shuffle=True,random_state=None)
# n_features：数据的维度 ，centers:表示类别数（标签的种类数）

plt.scatter(X[:, 0], X[:, 1],c=y) 
#plt.scatter(X[:, 0], X[:, 1],X[:, 2],c=y)

<matplotlib.collections.PathCollection at 0x23da1970250>

在这里插入图片描述

plt.figure(figsize=(7,7))
plt.subplot(211)
X, y = make_blobs(n_samples=400,n_features=3,centers=3, cluster_std=1.0,
                  center_box=(-10.0,10.0),shuffle=True,random_state=None)
plt.title("n_samples=400, n_features=2, n_informative=1")
plt.scatter(X[:, 0], X[:, 1],c=y)  
plt.subplot(212)
X1, y1 = make_blobs(n_samples=400,n_features=3,centers=4, cluster_std=1.0,
                  center_box=(-10.0,10.0),shuffle=True,random_state=None)
plt.title("n_samples=400, n_features=2, n_informative=2")
plt.scatter(X1[:, 0], X1[:, 1], c=y1)

<matplotlib.collections.PathCollection at 0x23da19be6d0>

在这里插入图片描述

「已注销」

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
人工数据集

重要参数：n_samples：设置样本数量、noise:设置噪声、factor：0 < double < 1 默认值0.8，内外圆之间的比例因子、random_state：设置随机参数（嘿嘿，无所谓，随便设），我们主要讲参数noise、factor。重要参数：n_samples：设置样本数量、noise:设置噪声、random_state：设置随机参数（嘿嘿，无所谓，随便设），我们主要讲参数noise。n_repeated ：重复信息，随机提取n_informative和n_redundant 特征。
复制链接

扫一扫