xgboost 安装、绘图笔记

系统:ubuntu 16.04

当前文档很不错了:[url]https://xgboost.readthedocs.io/en/latest/build.html[/url]

[size=large][color=blue]1、下载源码[/color][/size]
一行命令搞定,下载的源码在当前文件夹下,会创建一个xgboost目录
git clone --recursive https://github.com/dmlc/xgboost



[b]修改导出文件的精度[/b]
在src/tree/tree_model.cc中,修改如下方法,增加一行fo.precision(20);
std::string RegTree::Dump2Text(const FeatureMap& fmap, bool with_stats) const {
std::stringstream fo(""); fo.precision(20);
for (int i = 0; i < param.num_roots; ++i) {
DumpRegTree2Text(fo, *this, fmap, i, 0, with_stats);
}
return fo.str();
}

[size=large][color=blue]2、编译so[/color][/size]
cd xgboost; make -j4


[size=large][color=blue]3、安装python包[/color][/size]
cd python-package; sudo python setup.py install



[size=large][color=blue]4、示例[/color][/size]
先来看生成的决策树:
[img]http://dl2.iteye.com/upload/attachment/0120/5296/d6edff4d-d243-3cfc-9f02-4ae146f80182.png[/img]

#修改自: https://xgboost.readthedocs.io/en/latest/get_started/index.html

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import xgboost as xgb
from sklearn.metrics import roc_auc_score

xgFolder='/home/XXX/tools/xgboost/'

# read in data
dtrain = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.train')
# 训练文件第一行内容为:1 3:1 10:1 11:1 21:1 30:1 34:1 36:1 40:1 41:1 53:1 58:1 65:1 69:1 77:1 86:1 88:1 92:1 95:1 102:1 105:1 117:1 124:1
# 第一个表示标签为1, 第3个特征为1, 第10个特征为1, 。。。

weights=dtrain.get_weight()# 权重矩阵,类型是numpy.ndarray,, 但是不是指的读入的数据,而是每个sample的权重,不设置就为[]
labels=dtrain.get_label()# 标签,类型是numpy.ndarray
print(dtrain.get_base_margin())
print(weights)
print(labels[0])
dtest = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.test')

# specify parameters via map
# 调参:https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
# 参数详细介绍:https://xgboost.readthedocs.io/en/latest/parameter.html
booster='dart'
# booster='gbtree'
# booster='gblinear'

param = {'max_depth':3, 'eta':1, 'silent':0, 'objective':'binary:logistic','booster':booster }
num_round = 2

bst = xgb.train(param, dtrain, num_round)
# make prediction
preds = bst.predict(dtest)
print('AUC: %.4f'% roc_auc_score(dtest.get_label(), preds))
print('DONE')

#######################################################
# https://xgboost.readthedocs.io/en/latest/python/python_intro.html
# 绘制特征的重要性和决策树:
import matplotlib.pyplot as plt
ax=xgb.plot_importance(bst)
plt.show() #没有这句只有debug模式才会显示。。。

# ax=xgb.plot_tree(bst, num_trees=1)
ax=xgb.plot_tree(bst)
plt.show()


#存储决策树到图像
import codecs
f=codecs.open('xgb_tree.png', mode='wb')
g=xgb.to_graphviz(bst)
f.write(g.pipe('png'));
f.close()


输出(仅结果):
[list]
[*]AUC: 1.0000
[*]DONE
[/list]

[size=large][color=blue]5、有用的资料[/color][/size]
python API:[url]http://xgboost.readthedocs.io/en/latest/python/index.html[/url]

调参:[url]https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html[/url]
参数详细介绍:[url]https://xgboost.readthedocs.io/en/latest/parameter.html[/url]

boosted trees 简介:[url]https://xgboost.readthedocs.io/en/latest/tutorials/index.html[/url]

Awesome XGBoost:[url]https://github.com/dmlc/xgboost/tree/master/demo[/url]


使用C、C++API:
[url]http://stackoverflow.com/questions/36071672/using-xgboost-in-c[/url]
[url]http://qsalg.com/?p=388[/url]
[url]http://stackoverflow.duapp.com/questions/35289674/create-xgboost-dmatrix-in-c/37416279[/url]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值