机器学习sklearn.model_selection.train_test_split函数使用

splitting = train_test_split(*arrays,**options

如:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0, shuffle=False)

参数参数说明备注
splittingX_train

        list, length=2 * len(arrays)
        List containing train-test split of inputs.

        List, length=2 * len(数组)  

包含输入的训练测试分割的列表。 

划分后的训练数据集X轴
X_test划分后的测试数据集X轴
y_train划分后的训练数据集Y轴
y_test划分后的测试数据集Y轴
*arrays例子中的X        sequence of indexables with same length / shape[0]
        Allowed inputs are lists, numpy arrays, scipy-sparse
        matrices or pandas dataframes.
待划分的样本X轴
例子中的y待划分的样本Y轴
test_sizefloat or int, default=None        If float, should be between 0.0 and 1.0 and represent the proportion
        of the dataset to include in the test split. If int, represents the
        absolute number of test samples. If None, the value is set to the
        complement of the train size. If ``train_size`` is also None, it will
        be set to 0.25.
若在0~1之间,为测试集样本数目与原始样本数目之比;若为整数,则是测试集样本的绝对数量;不设置则默认为0.25
train_sizefloat or int, default=None        If float, should be between 0.0 and 1.0 and represent the
        proportion of the dataset to include in the train split. If
        int, represents the absolute number of train samples. If None,
        the value is automatically set to the complement of the test size.
若在0~1之间,为训练集样本数目与原始样本数目之比;若为整数,则是训练集样本的绝对数量;不设置则自动设置为0.75
random_stateint, RandomState instance or None, default=None

        Controls the shuffling applied to the data before applying the split.
        Pass an int for reproducible output across multiple function calls.
        See :term:`Glossary <random_state>`.

        控制在应用分割之前应用于数据的变换。  
        在多个函数调用之间传递一个int类型的可重复输出。  
        看:术语:“术语表< random_state >”。

随机数种子,不同的随机数种子划分的结果不同。

stratifyarray-like, default=None

        If not None, data is split in a stratified fashion, using this as
the class labels.
        Read more in the :ref:`User Guide <stratification>`.

        如果不是None,则以分层的方式分割数据,使用它作为  

类标签。  

        更多信息请参阅:ref: '用户指南<stratification> '。

stratify是为了保持split前类的分布,例如训练集和测试集数量的比例是 A:B= 4:1,等同于split前的比例(80:20)。通常在这种类分布不平衡的情况下会用到stratify。

shufflebool, default=True

        Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None.

        是否在分割之前洗牌数据。 如果为 False 那么stratify必须是None。

Examples
--------
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.33, random_state=42)

>>> X_train
array([[4, 5],
       [0, 1],
       [6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
       [8, 9]])
>>> y_test
[1, 4]

>>> train_test_split(y, shuffle=False)
[[0, 1, 2], [3, 4]]

import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

a = np.random.seed(0)
x = np.linspace(-10, 10, 100)
y = 0.85 * x - 0.72
e = np.random.normal(loc=0, scale=0.5, size=x.shape)
y += e
# plt.plot(x, y)
# plt.show()
x = x.reshape(-1, 1)
lr = LinearRegression()
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0, shuffle=False)
# print(x_train)
# print(x_test)
# print(y_train)
# print(y_test)
lr.fit(x_train, y_train)
# print('权重', lr.coef_)
# print('截距', lr.intercept_)
y_hat = lr.predict(x_test)
# print("实际值:", y_test.ravel()[:10])
# print("预测值:", y_hat[:10])
plt.plot(x_train, y_train)
plt.plot(x_test, y_test, '.')
plt.plot(x_test, y_hat)
plt.show()

 

  • 7
    点赞
  • 63
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值