XGBoost：多分类问题

最新推荐文章于 2024-05-27 12:03:38 发布

夏天7788

最新推荐文章于 2024-05-27 12:03:38 发布

阅读量1.7k

点赞数

分类专栏：其他开源库机器学习之分类

机器学习之分类同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

其他开源库

14 篇文章 0 订阅

订阅专栏

转自：http://blog.csdn.net/leo_xu06/article/details/52424924

下面用数据 UCI Dermatology dataset演示XGBoost的多分类问题

首先要安装好XGBoost的C++版本和相应的Python模块，然后执行如下脚本，如果本地没有训练所需要的数据，runexp.sh负责从https://archive.ics.uci.edu/ml/datasets/Dermatology下载数据集，然后调用train.py

Run runexp.sh

./runexp.sh
 
 1
 
 1

runexp.sh的代码

#!/bin/bash
if [ -f dermatology.data ]
then
echo "use existing data to run multi class classification"
else
echo "getting data from uci, make sure you are connected to internet"
wget https://archive.ics.uci.edu/ml/machine-learning-databases/dermatology/dermatology.data
fi
python train.py
 
 1
2
3
4
5
6
7
8
9
 
 1
2
3
4
5
6
7
8
9

train.py的代码

#! /usr/bin/python
import numpy as np
import xgboost as xgb
# label need to be 0 to num_class -1
data = np.loadtxt('./dermatology.data', delimiter=',',converters={33: lambda x:int(x == '?'), 34: lambda x:int(x)-1 } )
sz = data.shape
train = data[:int(sz[0] * 0.7), :]
test = data[int(sz[0] * 0.7):, :]
train_X = train[:,0:33]
train_Y = train[:, 34]
test_X = test[:,0:33]
test_Y = test[:, 34]
xg_train = xgb.DMatrix( train_X, label=train_Y)
xg_test = xgb.DMatrix(test_X, label=test_Y)
# setup parameters for xgboost
param = {}
# use softmax multi-class classification
param['objective'] = 'multi:softmax'
# scale weight of positive examples
param['eta'] = 0.1
param['max_depth'] = 6
param['silent'] = 1
param['nthread'] = 4
param['num_class'] = 6
watchlist = [ (xg_train,'train'), (xg_test, 'test') ]
num_round = 5
bst = xgb.train(param, xg_train, num_round, watchlist );
# get prediction
pred = bst.predict( xg_test );
print ('predicting, classification error=%f' % (sum( int(pred[i]) != test_Y[i] for i in range(len(test_Y))) / float(len(test_Y)) ))
# do the same thing again, but output probabilities
param['objective'] = 'multi:softprob'
bst = xgb.train(param, xg_train, num_round, watchlist );
# Note: this convention has been changed since xgboost-unity
# get prediction, this is in 1D array, need reshape to (ndata, nclass)
yprob = bst.predict( xg_test ).reshape( test_Y.shape[0], 6 )
ylabel = np.argmax(yprob, axis=1)
print ('predicting, classification error=%f' % (sum( int(ylabel[i]) != test_Y[i] for i in rang

夏天7788

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
XGBoost：多分类问题

转自：http://blog.csdn.net/leo_xu06/article/details/52424924下面用数据 UCI Dermatology dataset演示XGBoost的多分类问题首先要安装好XGBoost的C++版本和相应的Python模块，然后执行如下脚本，如果本地没有训练所需要的数据，runexp.sh负责从https://archive.ics.u
复制链接

扫一扫

专栏目录