一、算法原理
Xgboost 是Boosting算法的其中一种,Boosting算法的思想是将许多弱分类器集成在一起,形成一个强分类器。因为Xgboost是一种提升树模型,所以它是将许多树模型集成在一起,形成一个很强的分类器。而所用到的树模型则是CART回归树模型。
二、算法案例
#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import pickle
import xgboost as xgb
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error,make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold,train_test_split
from scipy.sparse import csr_matrix,hstack
from xgboost import XGBRegressor
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
# This may raise an exception in earlier versions of Jupyter
%config InlineBackend.figure_format = 'retina'
1.数据预处理
train = pd.read_csv('train.csv')
pd.set_option('display.max_column',150)
train.head()
(1)对数变化
train['log_loss'] = np.log(train['loss'])
(2)数据分成连续和离散特征
features &#