make_classification()

最新推荐文章于 2024-06-16 22:42:56 发布

Offer.harvester

最新推荐文章于 2024-06-16 22:42:56 发布

阅读量746

点赞数 1

分类专栏：机器学习&深度学习文章标签：数据处理 classification

本文链接：https://blog.csdn.net/qq_39072627/article/details/120747966

版权

机器学习&深度学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

make_classification是一个用于生成随机分类问题的数据集的函数，它创建具有有用信息、冗余和随机噪声特征的样本。参数包括样本数量、特征总数、有效信息特征数量、冗余特征数量等。通过调整参数，可以控制数据集的复杂性和噪声水平，适用于机器学习模型的训练和测试。

摘要由CSDN通过智能技术生成

make_classification()

def make_classification(
    n_samples=100,
    n_features=20,
    *,
    n_informative=2,
    n_redundant=2,
    n_repeated=0,
    n_classes=2,
    n_clusters_per_class=2,
    weights=None,
    flip_y=0.01,
    class_sep=1.0,
    hypercube=True,
    shift=0.0,
    scale=1.0,
    shuffle=True,
    random_state=None,
):

随机生成一个n类的分类问题。

This initially creates clusters of points normally distributed (std=1)
about vertices of an n_informative-dimensional hypercube with sides of
length 2*class_sep and assigns an equal number of clusters to each
class. It introduces interdependence between these features and adds
various types of further noise to the data.

Without shuffling, X horizontally stacks features in the following
order: the primary n_informative features, followed by n_redundant
linear combinations of the informative features, followed by n_repeated
duplicates, drawn randomly with replacement from the informative and
redundant features. The remaining features are filled with random noise.
Thus, without shuffling, all useful features are contained in the columns
X[:, :n_informative + n_redundant + n_repeated].

Parameters

n_samples : int, default=100
    样本数量

n_features : int, default=20   
特征总数 。其包括： [n_features = n_informative + n_redundant + n_repeated]
		n_informative:有用的，有效的信息特性，
		n_redundant:冗余特性，
		n_repeated:重复的特性 
		n_features-n_informative-n_redundant-n_repeated:其他没用的随机特征

n_informative : int, default=2
有效的信息特征总数. 

官方注解:Each class is composed of a number of gaussian clusters each located around the vertices of a hypercube in a subspace of dimension ``n_informative``. For each cluster, informative features are drawn independently from  N(0, 1) and then randomly linearly combined within each cluster in order to add covariance. The clusters are then placed on the vertices of the hypercube.

n_redundant : int, default=2
冗余特征的数量。这些特征是作为信息特征的随机线性组合产生的。

n_repeated : int, default=0
重复特征的数量。从信息和冗余特征中随机抽取。

n_classes : int, default=2
分类问题的类(或标签)的数量。

n_clusters_per_class : int, default=2
每个类的clusters数量

shuffle : bool, default=True
选择是否打乱样本和特征？

weights : array-like of shape (n_classes,) or (n_classes - 1,), default=None
    The proportions of samples assigned to each class. If None, then
    classes are balanced. Note that if ``len(weights) == n_classes - 1``,
    then the last class weight is automatically inferred.
    More than ``n_samples`` samples may be returned if the sum of
    ``weights`` exceeds 1. Note that the actual class proportions will
    not exactly match ``weights`` when ``flip_y`` isn't 0.

flip_y : float, default=0.01
    The fraction of samples whose class is assigned randomly. Larger
    values introduce noise in the labels and make the classification
    task harder. Note that the default setting flip_y > 0 might lead
    to less than ``n_classes`` in y in some cases.

class_sep : float, default=1.0
    The factor multiplying the hypercube size.  Larger values spread
    out the clusters/classes and make the classification task easier.

hypercube : bool, default=True
    If True, the clusters are put on the vertices of a hypercube. If
    False, the clusters are put on the vertices of a random polytope.

shift : float, ndarray of shape (n_features,) or None, default=0.0
    Shift features by the specified value. If None, then features
    are shifted by a random value drawn in [-class_sep, class_sep].

scale : float, ndarray of shape (n_features,) or None, default=1.0
    Multiply features by the specified value. If None, then features
    are scaled by a random value drawn in [1, 100]. Note that scaling
    happens after shifting.

random_state : int, RandomState instance or None, default=None
Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls.

Returns

X : ndarray of shape (n_samples, n_features)
    The generated samples.

y : ndarray of shape (n_samples,)
    The integer labels for class membership of each sample.

Offer.harvester

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
make_classification()

make_classification()def make_classification( n_samples=100, n_features=20, *, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hyp
复制链接

扫一扫

专栏目录