本文给个简单的例子,来看看sklearn是怎么用xgboost的。
1.先加载数据,这里用pandas,假设数据文件的最后一列是样本的标签。
from xgboost import XGBClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
# load data from CSV file,
def loadData(fileFullPath):
if (fileFullPath == None):
print('file path is empty, please check path!')
# use pandas read CSV file, return DataFrame
dataSet = pd.read_csv(fileFullPath)
pd.set_option('display.max_columns',20)
# split features and labels
featureNum = dataSet.shape[1]
trainData = dataSet.iloc[:,0:featureNum-1]
trainLabel = dataSet.iloc[:,-1]
return trainData, trainLabel
样本集长下面这样: