UCI数据集地址:http://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29
Random Forest:在Bagging策略的基础上进行修改后的一种算法
- 从样本集中用Bootstrap采样选出n个样本;
- 从所有属性中随机选择K个属性,选择出最佳分割属性作为节点创建决策树;
- 重复以上两步m次,即建立m棵决策树;
- 这m个决策树形成随机森林,通过投票表决结果决定数据属于那一类
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import label_binarize
from sklearn import metrics
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
names = [u'Age', u'Number of sexual partners', u'First sexual intercourse',
u'Num of pregnancies', u'Smokes', u'Smokes (years)',
u'Smokes (packs/year)', u'Hormonal Contraceptives',
u'Hormonal Contraceptives (years)', u'IUD', u'IUD (years)', u'STDs',
u'STDs (number)', u'STDs:condylomatosis',
u'STDs:cervical condylomatosis', u'STDs:vaginal condylomatosis',
u'STDs:vulvo-perineal condylomatosis', u'STDs:syphilis',
u'STDs:pelvic inflamm