make_classification函数

BlackStar_L

已于 2022-01-30 06:41:56 修改

阅读量1.1w

点赞数 18

分类专栏：常用函数解析文章标签： sklearn 机器学习算法

于 2022-01-30 03:58:32 首次发布

本文链接：https://blog.csdn.net/weixin_44225602/article/details/122726227

版权

常用函数解析专栏收录该内容

2 篇文章

订阅专栏

make_classification函数

sklearn.datasets.make_classification(n_samples=100, n_features=20, *, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None)

参数	类型	默认值	含义
n_samples	int	100	样本数量
n_features	int	20	特征总数。这些包括n_informative 信息特征、n_redundant冗余特征、 n_repeated重复特征和 n_features-n_informative-n_redundant-n_repeated随机抽取的无用特征。
n_informative	int	2	`信息特征`的数量。
n_redundant	int	2	`冗余特征`的数量。这些特征是作为`信息特征`的随机线性组合生成的。(假设n_informative=F1,F2,…那么n_redundant= aF1+bF2+… a,b,c就是随机数)
n_repeated	int	0	从`信息特征`和`冗余特征`中随机抽取的`重复特征`的数量。
n_classes	int	2	分类问题的类（或标签）数。
n_clusters_per_class	int	2	每个类的集群数。
random_state	int	None	类似随机种子，复现随机数

返回值	输出值	含义
X	ndarray(n_samples, n_features)	生成的n+samples个样本
y	ndarray(n_samples)	每个样本的类别成员的整数标签。

生成一个随机的 $n$ 类分类问题。

在不打乱的情况下，X按以下顺序水平堆叠特征：主要n_informative特征，然后n_redundant 是信息特征的线性组合，然后是n_repeated 重复，随机抽取信息和冗余特征的替换。其余特征充满随机噪声。因此，无需改组，所有有用的特征都包含在列中。X[:, :n_informative + n_redundant + n_repeated]

from sklearn.datasets import make_classification

X, y = make_classification(n_samples=6, n_classes=2, n_features=5, n_informative=5,n_redundant=0,n_clusters_per_class=1)
display(X,y)

"""
n_samples=6 - 6行6个数据
n_classes=2 - 结果分为2类即二分类
n_features=5 - 5个特征
n_informative=5 - 5个全部有效的特征
n_redundant=0 - 冗余特征为0
n_clusters_per_class=1 - 每一个类别聚为一个簇

array([[ 1.10885456, -1.97464085,  2.14372944, -0.08241471, -2.60173628],
       [ 0.98456921, -4.67257395, -0.10161149,  0.52329866,  2.0178222 ],
       [-2.92441307, -2.20249011,  0.12827954,  1.90711152,  0.24340137],
       [ 0.14524134, -1.42685331,  1.92731161, -0.72915701,  1.3529692 ],
       [-0.09694719, -0.28604481, -2.62609999, -0.46131174,  0.72515074],
       [ 0.25540393, -2.64589841, -2.05721611,  0.53203936,  0.34273113]])
       
array([0, 1, 1, 0, 1, 0])
"""