今天用sklearn模块中的决策树部分简单的尝试了一下泰坦尼克生还问题
import pandas as pd
from sklearn.tree import DecisionTreeClassifier as DTC
data=pd.read_csv('train.csv')
data=data.drop('PassengerId',1)
data.loc[data['Sex']=='male','Sex']=1
data.loc[data['Sex']=='female','Sex']=0
x=data.loc[:,['Pclass','Sex','Parch','Fare','SibSp']]
y=data.loc[:,'Survived']
#dtc=DTC(criterion='entropy')
dtc=DTC(criterion='gini')
dtc.fit(x,y)
print(dtc.score(x,y))
test=pd.read_csv('test.csv')
test.loc[test['Sex']=='male','Sex']=1
test.loc[test['Sex']=='female','Sex']=0
testpart=test.loc[1:100,['Pclass','Sex','Parch','Fare','SibSp']]
#print(testpart)
print(dtc.predict(testpart))
样本中的属性很多我从中选取了几个简单的进行测试,其中性别属性转换成了0,1
测试集选取了前100个进行了测试,结果如下:
训练集预测准确率大致为0.922558922559
数据集的下载http://download.csdn.net/download/cool_jia/10268620