cross_val_score是一个helper函数,它包装scikit learn的各种对象以进行交叉验证(例如KFold,StratifiedKFold)。它根据使用的scoring参数返回一个分数列表(对于分类问题,我相信默认情况下是accuracy)。在
cross_val_score的return对象不允许您访问交叉验证中使用的底层折叠/模型,这意味着您无法获得每个模型的系数。在
要获得交叉验证的每一次的系数,您需要使用KFold(或者如果您的类是不平衡的,StratifiedKFold)。在import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
df = pd.read_clipboard()
file = pd.concat([df, df, df]).reset_index()
X = file.drop(['Result'],1)
y = file['Result']
skf = StratifiedKFold(n_splits=2, random_state=0)
models, coefs = [], [] # in case you want to inspect the models later, too
for train, test in skf.split(X, y):
print(train, test)
clf = LogisticRegression(penalty='l1')
clf.fit(X.loc[train