算法实践DAT2：模型构建2.0

最新推荐文章于 2023-09-02 23:00:28 发布

WhoIsTing

最新推荐文章于 2023-09-02 23:00:28 发布

阅读量463

点赞数

分类专栏：算法实践

本文链接：https://blog.csdn.net/weixin_43399785/article/details/85799710

版权

算法实践专栏收录该内容

3 篇文章 0 订阅

订阅专栏

算法实践DAT2：模型构建2.0

任务：
构建随机森林、GBDT、XGBoost和LightGBM这4个模型，并对每一个模型进行评分，评分方式任意，例如准确度和auc值。

2关键点：4个模型和对应的评分结果
1.【导入数据包】

import pandas as pd 
from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn import metrics
import matplotlib.pyplot as plt

导入anaconda包的血泪史：
导入数据包出现了很多问题，之前是用pycharm+Anaconda，不过发现pycharm对新手真的不太友好，配置了半天还是第一导入import pandas as pd 就开始报错。。后来转用Anaconda自带的python IDE：Spyder。立马可以用Anaconda自带的包（包括pandas,sklearn）了。xgboost和lightgbm的第三方包需要通过下载，再到Anaconda Prompt里通过conda install安装。

Spyder的操作界面：

2.【和第一次作业一样处理：读取数据集和三七划分测试和训练集】

data_all = pd.read_csv('./data_all.csv')
X = data_all.drop(['status'],axis = 1) #drop函数删除'status'列数据
y = data_all['status'] 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=2018)

print('The shape of X: ', X.shape)
print('proportion of label 1:', len(y[y==1])/len(y))

输出：
The shape of X: (4754, 84)
proportion of label 1: 0.2509465713083719

3.【构建随机森林、GBDT、XGBoost和LightGBM模型】

#随机森林
rf_model = RandomForestClassifier(n_estimators='warn', criterion='gini', max_depth=None, min_samples_split=2,
                                  min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto',
                                  max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None,
                                  bootstrap=True, oob_score=False, n_jobs=None, random_state=2018, verbose=0,
                                  warm_start=False, class_weight=None)

#GBDT
gbdt_model = GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0,
                                        criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1,
                                        min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0,
                                        min_impurity_split=None, init=None, random_state=2018, max_features=None,
                                        verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto',
                                        validation_fraction=0.1, n_iter_no_change=None, tol=0.0001)

#XGBoost
xgb_model = XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, 
                          objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0,
                          min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1,
                          colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5,
                          random_state=2018, seed=None, missing=None)

#LightGBM
lgbm_model = LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, 
                            n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, 
                            min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, 
                            subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, 
                            random_state=2018, n_jobs=-1, silent=True, importance_type='split')

4.【训练模型】
采用for循环对4个模型依次进行训练和参数评估。评价指标用到准确率Accuracy, 精确率Precision。

    clf.fit(X_train,y_train)
    y_test_pred = clf.predict(X_test)
    
    acc = metrics.accuracy_score(y_test, y_test_pred)
    p = metrics.precision_score(y_test, y_test_pred)
    
    print(name, 'Accuracy:', acc)
    print(name, 'Pricision:', p)

输出：

RF Accuracy: 0.7694463910301331
RF Pricision: 0.59375
GBDT Accuracy: 0.7806587245970568
GBDT Pricision: 0.6116504854368932
XGBoost Accuracy: 0.7855641205325858
XGBoost Pricision: 0.6305418719211823
LightGBM Accuracy: 0.7701471618780659
LightGBM Pricision: 0.5701357466063348

WhoIsTing

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
算法实践DAT2：模型构建2.0

算法实践DAT2：模型构建2.0任务：构建随机森林、GBDT、XGBoost和LightGBM这4个模型，并对每一个模型进行评分，评分方式任意，例如准确度和auc值。2关键点：4个模型和对应的评分结果1.【和上次一样处理：读取数据集和三七划分测试和训练集】import pandas as pddata_all = pd.read_csv('./data_all.csv')X = da...
复制链接

扫一扫

专栏目录