使用XGboost进行分类,判断该患者是否患有糖尿病

该博客介绍了如何运用XGBoost算法对Pima Indians糖尿病数据集进行建模和优化。首先,数据被切分为训练集和测试集,然后使用XGBClassifier构建模型并计算准确率。接着,通过eval_set监控模型性能,并设置早停轮数以避免过拟合。最后,利用GridSearchCV进行参数调优,特别是学习率,以寻找最佳模型配置。
摘要由CSDN通过智能技术生成

详细代码在此:

# First XGBoost model for Pima Indians dataset
#  使用 XGBoost算法+给定的一堆参数进行分类,分为2类:该患者是否患有糖尿病的分类。
from numpy import loadtxt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# 读取csv文件
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split data into X and y
X = dataset[:, 0:8]  # 前8列是特征
Y = dataset[:, 8]  # 最后一列是label

# split data into train and test sets
seed = 7  # 随机种子
test_size = 0.33  # 67%是训练数据,33%是测试数据
# 数据集切分
X_train, X_test, y_train, y_test = \
    train_test_split(X, Y, test_size=test_size, random_state=seed)

model = XGBClassifier()  # 拿到XGBoost分类模型
# fit model
model.fit(X_train, y_train)

# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

# evaluate predictions
accuracy = accuracy_score(y_test, predictions)

# 输出小数点后2位
print("Accuracy:%.2f%%" % (accuracy * 100.0))

输出结果:
在这里插入图片描述

使用eval_set,每加上一个模型,我们都可以对它的分类效果进行监控。

# First XGBoost model for Pima Indians dataset
# 每加上一个模型,我们都可以对它的分类效果进行监控。
from numpy import loadtxt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split data into X and y
X = dataset[:, 0:8]  # 前8列是特征
Y = dataset[:, 8]  # 最后一列是label

# split data into train and test sets
seed = 7  # 随机种子
test_size = 0.33  # 67%是训练数据,33%是测试数据
# 数据集切分
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBClassifier()  # 拿到模型
# 每加入一个模型,就会拿eval_set进行测试
eval_set = [(X_test,y_test)]
# 如果连续10次,loss值都是没有下降的。我们就停止训练。
# 评估表标准 logloss
# verbose(啰嗦) 每加入一个模型都会打印当前效果
model.fit(X_train, y_train,early_stopping_rounds=10,eval_metric="logloss",eval_set=eval_set,verbose=True)

# make predictions for test data
y_pred = model.predict(X_test)

predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
# 输出小数点后2位
print("Accuracy:%.2f%%" % (accuracy * 100.0))

使用plot_importance ,绘制特征值的重要程度

from numpy import loadtxt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from xgboost import plot_importance #plot_importance 可以绘制特征值的重要程度
from matplotlib import pyplot

# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split data into X and y
X = dataset[:, 0:8]  # 前8列是特征
y = dataset[:, 8]  # 最后一列是label

# fit model no training data
model = XGBClassifier()
model.fit(X,y)
# plot feature importance
plot_importance(model)
pyplot.show()

在这里插入图片描述

使用GridSearchCV挑选出最好的学习率

from numpy import loadtxt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV  # 遍历后可以通过GridSearchCV挑选出最好的学习率
from sklearn.model_selection import StratifiedKFold

# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split data into X and y
X = dataset[:, 0:8]  # 前8列是特征
y = dataset[:, 8]  # 最后一列是label

# fit model no training data
model = XGBClassifier()

learning_rate = [0.0001,0.001,0.01,0.1,0.2,0.3]

# 把学习率转换成字典
param_grid =dict(learning_rate = learning_rate)
# 交叉验证
kfold = StratifiedKFold(n_splits=10,shuffle=True,random_state=7)
# n_jobs 当前所有空闲的CPU都去进行计算
grid_search = GridSearchCV(model,param_grid,scoring='neg_log_loss',n_jobs=-1,cv=kfold)
grid_result = grid_search.fit(X,y)
# summarize results
print(" Best: %f using %s" %(grid_result. best_score_ ,grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
params = grid_result.cv_results_['params']
for mean,param in zip(means,params):
    print(" %f with %r "%(mean,param))

运行结果如下:
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值