python对数据随机抽取一半,拆分数据集中的Python随机状态

最新推荐文章于 2023-08-23 12:11:49 发布

weixin_39976153

最新推荐文章于 2023-08-23 12:11:49 发布

阅读量258

点赞数

文章标签： python对数据随机抽取一半

I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.30, random_state=0)

I have seen situations like this where random state is set to one!

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.30, random_state=1)

What is the consequence of this random state in cross validation as well?

解决方案

It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42 used in many official examples of scikit as well as elsewhere also.

random_state as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:

If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

If random_state is an integer, then it is used to seed a new RandomState object.

If random_state is a RandomState object, then it is passed through.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.

weixin_39976153

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python对数据随机抽取一半,拆分数据集中的Python随机状态

I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.X_train, X_test, y_train, y_test = \train_test_split(X, y, test_size=0.30, random_state=0)...
复制链接

扫一扫