- K折交叉验证
将原始数据分成K组,然后将每个子集数据分别做一次验证集,其余K-1组子集数据作为训练集,这样就会得到K个模型,将K个模型最终的验证集的分类准确率取平均值,作为K折交叉验证分类器的性能指标
from sklearn.model_selection import KFold
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
kf = KFold(n_splits=10)
for k, (train_index, test_index) in enumerate(kf.split(new_train_pca_16)):
train_data, test_data, train_target, test_target = train.values[train_index],train.values[test_index],target[train_index],target[test_index]
clf = SGDRegressor(max_iter=1000,tol=1e-3)
clf.fit(train_data,train_target)
train_pred = clf.predict(train_data)
test_pred = clf.predict(test_data)
train_score = mean_squared_error(train_pred, train_target)
test_score = mean_squared_error(test_pred, test_target)
print(k,'+',train_score)
print(k,'+',test_score)
- 留一法交叉验证
训练集由除一个样本之外的其余样本组成,留下一个样本组成验证集,对于N个样本的数据集,可以组成N个不同的训练集和N个不同的验证集
from sklearn.model_selection import LeaveOneOut
loo = LeaveOneOut()
num = 100
for k, (train_index, test_index) in enumerate(loo.split(new_train_pca_16)):
train_data, test_data, train_target, test_target = train.values[train_index],train.values[test_index],target[train_index],target[test_index]
clf = SGDRegressor(max_iter=1000,tol=1e-3)
clf.fit(train_data,train_target)
train_pred = clf.predict(train_data)
test_pred = clf.predict(test_data)
train_score = mean_squared_error(train_pred, train_target)
test_score = mean_squared_error(test_pred, test_target)
print(k,'+',train_score)
print(k,'+',test_score)
if k>9:
break
- 留P法交叉验证
从完成的数据集中删除P个样本,产生所有可能的训练集和验证集
from sklearn.model_selection import LeavePOut
lpo = LeavePOut(p=10)
num = 100
for k, (train_index, test_index) in enumerate(loo.split(new_train_pca_16)):
train_data, test_data, train_target, test_target = train.values[train_index],train.values[test_index],target[train_index],target[test_index]
clf = SGDRegressor(max_iter=1000,tol=1e-3)
clf.fit(train_data,train_target)
train_pred = clf.predict(train_data)
test_pred = clf.predict(test_data)
train_score = mean_squared_error(train_pred, train_target)
test_score = mean_squared_error(test_pred, test_target)
print(k,'+',train_score)
print(k,'+',test_score)
if k>9:
break