make_classification()

make_classification是一个用于生成随机分类问题的数据集的函数,它创建具有有用信息、冗余和随机噪声特征的样本。参数包括样本数量、特征总数、有效信息特征数量、冗余特征数量等。通过调整参数,可以控制数据集的复杂性和噪声水平,适用于机器学习模型的训练和测试。
摘要由CSDN通过智能技术生成

make_classification()

def make_classification(
    n_samples=100,
    n_features=20,
    *,
    n_informative=2,
    n_redundant=2,
    n_repeated=0,
    n_classes=2,
    n_clusters_per_class=2,
    weights=None,
    flip_y=0.01,
    class_sep=1.0,
    hypercube=True,
    shift=0.0,
    scale=1.0,
    shuffle=True,
    random_state=None,
):

随机生成一个n类的分类问题。

This initially creates clusters of points normally distributed (std=1)
about vertices of an n_informative-dimensional hypercube with sides of
length 2*class_sep and assigns an equal number of clusters to each
class. It introduces interdependence between these features and adds
various types of further noise to the data.

Without shuffling, X horizontally stacks features in the following
order: the primary n_informative features, followed by n_redundant
linear combinations of the informative features, followed by n_repeated
duplicates, drawn randomly with replacement from the informative and
redundant features. The remaining features are filled with random noise.
Thus, without shuffling, all useful features are contained in the columns
X[:, :n_informative + n_redundant + n_repeated].

Parameters

n_samples : int, default=100
    样本数量
n_features : int, default=20   
特征总数 。其包括: [n_features = n_informative + n_redundant + n_repeated]
		n_informative:有用的,有效的信息特性,
		n_redundant:冗余特性,
		n_repeated:重复的特性 
		n_features-n_informative-n_redundant-n_repeated:其他没用的随机特征
n_informative : int, default=2
有效的信息特征总数. 

官方注解:Each class is composed of a number of gaussian clusters each located around the vertices of a hypercube in a subspace of dimension ``n_informative``. For each cluster, informative features are drawn independently from  N(0, 1) and then randomly linearly combined within each cluster in order to add covariance. The clusters are then placed on the vertices of the hypercube.
n_redundant : int, default=2
冗余特征的数量。这些特征是作为信息特征的随机线性组合产生的。
n_repeated : int, default=0
重复特征的数量。从信息和冗余特征中随机抽取。
n_classes : int, default=2
分类问题的类(或标签)的数量。
n_clusters_per_class : int, default=2
每个类的clusters数量
shuffle : bool, default=True
选择是否打乱样本和特征?
weights : array-like of shape (n_classes,) or (n_classes - 1,), default=None
    The proportions of samples assigned to each class. If None, then
    classes are balanced. Note that if ``len(weights) == n_classes - 1``,
    then the last class weight is automatically inferred.
    More than ``n_samples`` samples may be returned if the sum of
    ``weights`` exceeds 1. Note that the actual class proportions will
    not exactly match ``weights`` when ``flip_y`` isn't 0.

flip_y : float, default=0.01
    The fraction of samples whose class is assigned randomly. Larger
    values introduce noise in the labels and make the classification
    task harder. Note that the default setting flip_y > 0 might lead
    to less than ``n_classes`` in y in some cases.

class_sep : float, default=1.0
    The factor multiplying the hypercube size.  Larger values spread
    out the clusters/classes and make the classification task easier.

hypercube : bool, default=True
    If True, the clusters are put on the vertices of a hypercube. If
    False, the clusters are put on the vertices of a random polytope.

shift : float, ndarray of shape (n_features,) or None, default=0.0
    Shift features by the specified value. If None, then features
    are shifted by a random value drawn in [-class_sep, class_sep].

scale : float, ndarray of shape (n_features,) or None, default=1.0
    Multiply features by the specified value. If None, then features
    are scaled by a random value drawn in [1, 100]. Note that scaling
    happens after shifting.
random_state : int, RandomState instance or None, default=None
Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls.

Returns

X : ndarray of shape (n_samples, n_features)
    The generated samples.

y : ndarray of shape (n_samples,)
    The integer labels for class membership of each sample.
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值