boosting就是通过不断地增大分类错误的样本的权重,降低分类正确的权重,从而生成不同的弱分类器,并通过多数投票等组合形式,得到最终分类结果。
# 引入数据科学相关工具包:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline
import seaborn as sns
# 加载训练数据:
wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",header=None)
wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash','Magnesium', 'Total phenols','Flavanoids', 'Nonflavanoid phenols',
'Proanthocyanins','Color intensity', 'Hue','OD280/OD315 of diluted wines','Proline']
# 数据查看:
print("Class labels",np.unique(wine["Class label"]))
wine.head()
y = wine['Class label'].values
X = wine[['Alcohol','OD280/OD315 of diluted wines']]