整理了一个小代码片段,阐述了如何使用DecisionTreeRegressor和交叉验证。
A.在第一段代码中,使用了“cross-val_score”。但是,r2úu分数可能为负,这让我们了解到模型的学习效果不佳。from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.20, random_state=0)
dt = DecisionTreeRegressor(random_state=0, criterion="mae")
dt_fit = dt.fit(X_train, y_train)
dt_scores = cross_val_score(dt_fit, X_train, y_train, cv = 5)
print("mean cross validation score: {}".format(np.mean(dt_scores)))
print("score without cv: {}".format(dt_fit.score(X_train, y_train)))
# on the test or hold-out set
from sklearn.metrics import r2_score
print(r2_score(y_test, dt_fit.predict(X_test)))
print(dt_fit.score(X_test, y_test))
B.在下一节中,使用交叉验证对参数“min_samples_split”执行网格搜索,然后使用最佳估计