Sklearn的train_test_split

最新推荐文章于 2024-03-05 17:00:34 发布

zuiqingxuan

最新推荐文章于 2024-03-05 17:00:34 发布

阅读量197

点赞数

分类专栏： python 文章标签：机器学习

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

https://blog.csdn.net/fxlou/article/details/79189106

在机器学习中，该函数可按照用户设定的比例，随机将样本集合划分为训练集 和测试集，并返回划分好的训练集和测试集数据。

语法

X_train,X_test, y_train, y_test =cross_validation.train_test_split(X,y,test_size, random_state)
 
 1

参数说明

Code	Text
X	待划分的样本特征集合
y	待划分的样本标签
test_size	若在0~1之间，为测试集样本数目与原始样本数目之比；若为整数，则是测试集样本的数目。
random_state	随机数种子
X_train	划分出的训练集数据（返回值）
X_test	划分出的测试集数据（返回值）
y_train	划分出的训练集标签（返回值）
y_test	划分出的测试集标签（返回值）

代码示例
输入：

import numpy as np
from sklearn.model_selection import train_test_split

#创建一个数据集X和相应的标签y,X中样本数目为100
X, y = np.arange(200).reshape((100, 2)), range(100)

#用train_test_split函数划分出训练集和测试集，测试集占比0.33
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

#打印出原始样本集、训练集和测试集的数目
print("The length of original data X is:", X.shape[0])
print("The length of train Data is:", X_train.shape[0])
print("The length of test Data is:", X_test.shape[0])

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14

输出：

The length of original data X is: 100
The length of train Data is: 67
The length of test Data is: 33

zuiqingxuan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Sklearn的train_test_split

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/fxlou/article/details/79189106 &amp;amp;lt;/div&amp;amp;gt; &amp;amp;lt;div id=&amp;quot;content_views&amp;quot; class=&amp;quot;markdown_vi
复制链接

扫一扫

专栏目录