python包xgboost安装和简单使用

34 篇文章 5 订阅
4 篇文章 1 订阅

1、xgboost安装

xgboost是eXtreme Gradient Boosting(极限提升树),由陈天奇提出,因性能强大,广泛应用数据挖掘中。下面介绍在python下安装xgboost包。
pip 安装

直接cmd命令行输入:
pip install xgboost

如果不行,可以尝试手动安装

cp36代表python3.6。
下载一下安装包. 其实,numpy和scipy可以直接pip install, pip install numpypip install scipy,主要下载xgboost安装包就行。

numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl(如果想手动安装numpy)
 scipy-0.19.1-cp36-cp36m-win_amd64.whl(如果想手动安装scipy)
 xgboost-0.6-cp36-cp36m-win_amd64.whl 

这里-xgboost下载,然后在cmd,在下载文件的路径,运行:

F:\下载软件>pip install numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl
>> pip install scipy-0.19.1-cp36-cp36m-win_amd64.whl
>> pip install xgboost-0.6+20171121-cp36-cp36m-win_amd64.whl

2、xgboost调试

xgboost可以分成两种实现:1、原生接口(train);2、scikitlearn实现(fit);

xgboost处理的问题分成两种:1、分类问题;2、回归问题

导入xgboost:
import xgboost as xgb.

2.1 Demo1 基于XGBoost原生接口的分类
xgb.train

#xgboost 分类
from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234565)

params = {
    'booster': 'gbtree',
    'objective': 'multi:softmax',
    'num_class': 3,
    'gamma': 0.1,
    'max_depth': 6,
    'lambda': 2,
    'subsample': 0.7,
    'colsample_bytree': 0.7,
    'min_child_weight': 3,
    'silent': 1,
    'eta': 0.1,
    'seed': 1000,
    'nthread': 4,
}

plst = params.items()


dtrain = xgb.DMatrix(X_train, y_train)
num_rounds = 500
model = xgb.train(plst, dtrain, num_rounds)

# 对测试集进行预测
dtest = xgb.DMatrix(X_test)
ans = model.predict(dtest)

# 计算准确率
cnt1 = 0
cnt2 = 0
for i in range(len(y_test)):
    if ans[i] == y_test[i]:
        cnt1 += 1
    else:
        cnt2 += 1

print("Accuracy: %.2f %% " % (100 * cnt1 / (cnt1 + cnt2)))

# 显示重要特征
plot_importance(model)
plt.show()

运行结果:

在这里插入图片描述

2.2 Demo2-基于Scikit-learn接口的分类
xgb.fit

#5.3 基于Scikit-learn接口的分类

from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 训练模型
model = xgb.XGBClassifier(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True, objective='multi:softmax')
model.fit(X_train, y_train)

# 对测试集进行预测
ans = model.predict(X_test)

# 计算准确率
cnt1 = 0
cnt2 = 0
for i in range(len(y_test)):
    if ans[i] == y_test[i]:
        cnt1 += 1
    else:
        cnt2 += 1

print("Accuracy: %.2f %% " % (100 * cnt1 / (cnt1 + cnt2)))

# 显示重要特征
plot_importance(model)
plt.show()

以下两个demo是回归问题的实现:

2.3 demo3-基于Scikit-learn接口的回归
xgb.fit

#5.4 基于Scikit-learn接口的回归

import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
import csv

# 读取文件原始数据
X=[]
y=[]		
csvFile = open(r"G:\训练小样本3_label.csv", "r")
reader = csv.reader(csvFile)
for item in reader:
	item=[float(ii) for ii in item]
	X.append(item)

# 把读取的数据转化成float格式
for i in range(len(X)):
    y.append(X[i].pop())

# XGBoost训练过程
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=True, 
objective='reg:gamma')
model.fit(X_train, y_train)

# 对测试集进行预测
ans = model.predict(X_test)

# 显示重要特征
plot_importance(model)
plt.show() 

2.4基于XGBoost原生接口的回归

# 5.2 基于XGBoost原生接口的回归

import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
import csv

from xgboost import XGBClassifier

X=[]
y=[]			
csvFile = open(r"G:\训练小样本4large_label.csv", "r")
reader = csv.reader(csvFile)
for item in reader:
	item=[float(ii) for ii in item]
	X.append(item)

# 把读取的数据转化成float格式
for i in range(len(X)):
    y.append(X[i].pop())

# XGBoost训练过程
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

params = {
    'booster': 'gbtree',
    'objective': 'reg:gamma',
    'gamma': 0.1,
    'max_depth': 5,
    'lambda': 3,
    'subsample': 0.7,
    'colsample_bytree': 0.7,
    'min_child_weight': 3,
    'silent': 1,
    'eta': 0.1,
    'seed': 1000,
    'nthread': 4,
}

dtrain = xgb.DMatrix(X_train, y_train)
# num_rounds = 300
num_rounds = 126
plst = params.items()
model = xgb.train(plst, dtrain, num_rounds)

# 对测试集进行预测
dtest = xgb.DMatrix(X_test)
ans = model.predict(dtest)

# 显示重要特征
plot_importance(model)
plt.show()

3、其他使用

定义model,获得gbtree.

# xgboost usage
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.cross_validation import train_test_split
from sklearn import cross_validation
from matplotlib import pyplot as plt
from xgboost import plot_importance

train_df = pd.read_csv(r'G:\训练小样本.csv')
x_list=['210X1','220X2','310X2','311X72','261X237']
X_train = train_df[x_list].values
y_train = train_df.Y.values

X_dtrain, X_deval, y_dtrain, y_deval = cross_validation.train_test_split(X_train, y_train, random_state=1026, test_size=0.3)


dtrain = xgb.DMatrix(X_dtrain, y_dtrain)

deval = xgb.DMatrix(X_deval, y_deval)
watchlist = [(deval, 'eval')]
params = {
    'booster': 'gbtree',
    'objective': 'reg:linear',
    'subsample': 0.8,
    'colsample_bytree': 0.85,
    'eta': 0.05,
    'max_depth': 7,
    'seed': 2016,
    'silent': 0,
    'eval_metric': 'rmse'
}
df_test=[100,1.5,1.5,90,95]
clf = xgb.train(params, dtrain, 50, watchlist, early_stopping_rounds=50)
pred = clf.predict(xgb.DMatrix(df_test))

# 显示重要特征
plot_importance(clf)
plt.show() 


最近开通了个公众号,主要分享python学习相关内容,推荐系统,风控等算法相关的内容,感兴趣的伙伴可以关注下。
在这里插入图片描述
公众号相关的学习资料会上传到QQ群596506387,欢迎关注。


参考:

  1. Xgboost github examples;
  2. xgboost: 速度快效果好的 boosting 模型
  3. xgboost基本入门_知乎专栏
  4. xgboost安装指南
  5. 已失效 xgboost在Windows安装_简书
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

rosefunR

你的赞赏是我创作的动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值