GBDT 简单实现
梯度提升决策树(Gradient Boosting Decision Tree,GBDT)算法是近年来被提及比较多的一个算法,这主要得益于其算法的性能,以及该算法在各类数据挖掘以及机器学习比赛中的卓越表现,有很多人对GBDT算法进行了开源代码的开发,比较火的是XGBoost和微软的LightGBM。
实战
# -*- encoding: utf-8 -*-
'''
# @author:Little_Devil
# @file: GBDT.py
# @time:2018/12/3/17:18
'''
#1.加载数据
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data
y = digits.target
#数据切分
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
#2.构建模型
from sklearn.ensemble import GradientBoostingClassifier
dtc = GradientBoostingClassifier(loss='deviance', learning_rate=0.005, n_estimators=100,
subsample=1.0, min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.,
max_depth=3, init=None, random_state=None,
max_features=None, verbose=0,
max_leaf_nodes=None, warm_start=False,
presort='auto')
#3.训练模型
dtc.fit(X_train,y_train)
#3.测试
y_pred = dtc.predict(X_test)
#4.模型校验
print "Model in train score is:",dtc.score(X_train,y_train)
print "Model in test score is:",dtc.score(X_test,y_test)
from sklearn.metrics import classification_report
print "report is:",classification_report(y_test,y_pred)
# ---------------------------执行结果--------------------------------------------
# Model in train score is: 0.929343308396
# Model in test score is: 0.875420875421
# report is: precision recall f1-score support
#
# 0 0.98 0.91 0.94 55
# 1 0.89 0.87 0.88 55
# 2 0.91 0.79 0.85 52
# 3 0.88 0.88 0.88 56
# 4 0.92 0.91 0.91 64
# 5 0.97 0.81 0.88 73
# 6 0.87 0.95 0.91 57
# 7 0.81 0.92 0.86 62
# 8 0.81 0.92 0.86 52
# 9 0.77 0.82 0.79 68
#
# avg / total 0.88 0.88 0.88 594
喜欢就点赞评论+关注吧
感谢阅读,希望能帮助到大家,谢谢大家的支持!