一、学习内容
task4-建模调参
线性回归模型:
- 线性回归对于特征的要求;
- 处理长尾分布;
- 理解线性回归模型;
模型性能验证:
- 评价函数与目标函数;
- 交叉验证方法;
- 留一验证方法;
- 针对时间序列问题的验证;
- 绘制学习率曲线;
- 绘制验证曲线;
嵌入式特征选择:
- Lasso回归;
- Ridge回归;
- 决策树;
模型对比:
- 常用线性模型;
- 常用非线性模型;
模型调参:
- 贪心调参方法;
- 网格调参方法;
- 贝叶斯调参方法;
二、代码
1.Linear Model
from sklearn.linear_model import LinearRegression
model=LinearRegression(normalize=True)
model=model.fit(train_X,train_y)
print(‘intercept:’,model.intercept_)
print(sorted(dict(zip(continuous_feature_names,model.coef_)).items(),key=lambda x:x[1],reverse=True))
#对标签进行log变换,使其更接近于正态分布
train_y_ln=np.log(train_y+1)
2.多种模型比较
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso,Ridge
models=[LinearRegression(),
Lasso(),
Ridge()]
result=dict()
for model in models:
model_name=str(model).split(’(’)[0]
scores=cross_val_score(model,X=train_X,y=train_y_ln,verbose=0,cv=5,scoring=make_scorer(mean_absolute_error))
result[model_name]=scores
print(model_name+’ is finished!’)
result=pd.DataFrame(result)
result.index=[‘cv’+str(x) for x in range(1,6)]
print(result)
3.非线性模型
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor,GradientBoostingRegressor
from sklearn.neural_network import MLPRegressor
from xgboost.sklearn import XGBRegressor
from lightgbm.sklearn import LGBMRegressor
‘’’
models=[LinearRegression(),
DecisionTreeRegressor(),
RandomForestRegressor(),
GradientBoostingRegressor(),
MLPRegressor(solver=‘lbfgs’,max_iter=100),
XGBRegressor(n_estimators=100,objective=‘reg:squarederror’),
LGBMRegressor(n_estimators=100)]
result=dict()
for model in models:
model_name=str(model).split(’(’)[0]
scores=cross_val_score(model,X=train_X,y=train_y_ln,verbose=0,cv=5,scoring=make_scorer(mean_absolute_error))
result[model_name]=scores
print(model_name+’ is finished!’)
4.模型调参
from sklearn.model_selection import GridSearchCV
parameters={‘objective’:objective,‘num_leaves’:num_leaves,‘max_depth’:max_depth}
model=LGBMRegressor()
clf=GridSearchCV(model,parameters,cv=5)
clf=clf.fit(train_X,train_y)
print(clf.best_params_)