python实现scikit-learn的k折交叉验证得分器(评估模型)
cross_val_score方法极为有用的功能是可以把不同分区的评估任务分给计算机的多个CPU。假设把n_jobs设为1,只有一个CPU会用于性能评估,就像前面StratifiedKFold示例展示的那样。然而,如果设置n_jobs=2,可以把10轮交叉验证任务分给两个CPU来完成(如果系统有那么多CPU的话),如果设置n_jobs=-1,可以用计算机上所有可用的CPU同时进行计算。
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.model_selection import cross_val_score
df = pd.read_csv('xxx\\wdbc.data',
header=None)
print(df.head())
X = df.loc[:, 2:].values
y = df.loc[:, 1].values
le = LabelEncoder()
y = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.20,
stratify=y,
random_state=1)
pipe_lr = make_pipeline(StandardScaler(),
PCA(n_components=2),
LogisticRegression(random_state=1))
scores = cross_val_score(estimator=pipe_lr,
X=X_train,
y=y_train,
cv=10,
n_jobs=1)
print('CV accuracy scores: %s' % scores)
print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))
运行结果:
0 1 2 3 4 … 27 28 29 30 31
0 842302 M 17.99 10.38 122.80 … 0.6656 0.7119 0.2654 0.4601 0.11890
1 842517 M 20.57 17.77 132.90 … 0.1866 0.2416 0.1860 0.2750 0.08902
2 84300903 M 19.69 21.25 130.00 … 0.4245 0.4504 0.2430 0.3613 0.08758
3 84348301 M 11.42 20.38 77.58 … 0.8663 0.6869 0.2575 0.6638 0.17300
4 84358402 M 20.29 14.34 135.10 … 0.2050 0.4000 0.1625 0.2364 0.07678
[5 rows x 32 columns]
CV accuracy scores: [0.93478261 0.93478261 0.95652174 0.95652174 0.93478261 0.95555556
0.97777778 0.93333333 0.95555556 0.95555556]
CV accuracy: 0.950 +/- 0.014