1 交叉验证包中的train_test_split
设置测试集比例,在原始数据中随机采样,但是所得样本中各个类别比例保持与原样本一致。如下例所示
import numpy as np
data_x=[['this is class 1 ']]*100 + [['this is class 2']]*50
data_y=[[1]]*100 + [[2]]*50
data_y
X_train, X_test, y_train, y_test = train_test_split(data_x,data_y,test_size = 0.3)
X_train
#sum(y_train==1)/len(y_train)
print('numbers of positive class in training data:', sum( np.mat(y_train)==1 )[