Kaggle案例(一)Titanic: Machine Learning from Disaste

1. 案例简介

Titanic 案例是Kaggle 入门案例,链接地址https://www.kaggle.com/c/titanic 。以下是摘自官网上的描述信息:
Kaggle案例(一)Titanic: Machine Learning from Disaste

2. 分析数据

2.1 读取数据

加载训练数据

data_train = pd.read_csv("./input/train.csv")

预览数据

data_train.head()

Kaggle案例(一)Titanic: Machine Learning from Disaste
训练集数据说明:
Kaggle案例(一)Titanic: Machine Learning from Disaste

查看数据集信息

data_train.info()

Kaggle案例(一)Titanic: Machine Learning from Disaste

查看有缺失值的列

ata_train.columns[data_train.isnull().any()].tolist()

Kaggle案例(一)Titanic: Machine Learning from Disaste

计算缺失数

age_null_count = data_train.Age.isnull().sum()
cabin_null_count = data_train.Cabin.isnull().sum()
embarked_null_count = data_train.Embarked.isnull().sum()
print('Age列缺失:%s' %age_null_count)
print('Cabin列缺失:%s' %cabin_null_count)
print('Embarked列缺失:%s' %embarked_null_count)

Kaggle案例(一)Titanic: Machine Learning from Disaste

2.2 处理数据

Age列缺失值
使用Age列中位数填充缺失值

data_train.Age.fillna(data_train.Age.median())

Cabin列缺失值
Cabin列数据缺失条目较多,计算Survived列与Cabin列数据关系

Survived_cabin = data_train.Survived[pd.notnull(data_train.Cabin)].value_counts()
print(Survived_cabin)

Kaggle案例(一)Titanic: Machine Learning from Disaste

Survived_nocabin = data_train.Survived[pd.isnull(data_train.Cabin)].value_counts()
print(Survived_nocabin)

Kaggle案例(一)Titanic: Machine Learning from Disaste

可以发现有Cabin信息的乘客获救几率要大。将Cabin列数据作为一个分类标签处理

Embarked列缺失值
使用Embarked列众数填充缺失值

data_train.Embarked.fillna(data_train.Embarked.mode())

2.3 数据展现

获救人数情况

# 绘制获救人数情况
data_train.Survived.value_counts().plot(kind='bar')
plt.title("获救情况")
plt.xticks([0,1], ["未获救","获救"], rotation=0)
plt.ylabel("人数")

Kaggle案例(一)Titanic: Machine Learning from Disaste
各等级的乘客年龄分布

data_train.Age[data_train.Pclass == 1].plot(kind='kde')   
data_train.Age[data_train.Pclass == 2].plot(kind='kde')
data_train.Age[data_train.Pclass == 3].plot(kind='kde')
plt.xlabel("年龄")
plt.ylabel("密度") 
plt.title("各等级的乘客年龄分布")
plt.legend(('一等舱', '二等舱','三等舱'),loc='best')

Kaggle案例(一)Titanic: Machine Learning from Disaste

各乘客等级的获救情况

Survived_0 = data_train.Pclass[data_train.Survived == 0].value_counts()
Survived_1 = data_train.Pclass[data_train.Survived == 1].value_counts()
df=pd.DataFrame({'获救':Survived_1, '未获救':Survived_0})
df.plot(kind='bar', stacked=True)
plt.title("船舱等级的获救情况")
plt.xlabel("船舱等级") 
plt.ylabel("人数") 
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

绘制登船口岸上船人数

data_train.Embarked.value_counts().plot(kind='bar')
plt.title("各登船口岸上船人数")
plt.ylabel("人数")
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

各登录港口的获救情况

Survived_0 = data_train.Embarked[data_train.Survived == 0].value_counts()
Survived_1 = data_train.Embarked[data_train.Survived == 1].value_counts()
df=pd.DataFrame({'获救':Survived_1, '未获救':Survived_0})
df.plot(kind='bar', stacked=True)
plt.title("登陆港口乘客的获救情况")
plt.xlabel("登陆港口") 
plt.ylabel("人数") 
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

各性别的获救情况

Survived_m = data_train.Survived[data_train.Sex == 'male'].value_counts()
Survived_f = data_train.Survived[data_train.Sex == 'female'].value_counts()
df=pd.DataFrame({'男性':Survived_m, '女性':Survived_f})
df.plot(kind='bar', stacked=True)
plt.title("男女性别获救情况")
plt.xlabel("性别") 
plt.ylabel("人数")
plt.xticks([0,1], ["未获救","获救"], rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

SibSp字段获救情况

SibSp_0 = data_train.SibSp[data_train.Survived == 0].value_counts()
SibSp_1 = data_train.SibSp[data_train.Survived == 1].value_counts()
SibSp_df=pd.DataFrame({'未获救':SibSp_0, '获救':SibSp_1})
SibSp_df.plot(kind='bar',stacked=True)
plt.title("堂兄弟/妹个数获救情况")
plt.xlabel("堂兄弟/妹个数") 
plt.ylabel("人数")
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

Parch字段获救情况

Parch_0 = data_train.Parch[data_train.Survived == 0].value_counts()
Parch_1 = data_train.Parch[data_train.Survived == 1].value_counts()
Parch_df=pd.DataFrame({'未获救':Parch_0, '获救':Parch_1})
Parch_df.plot(kind='bar',stacked=True)
plt.title("父母与小孩个数获救情况")
plt.xlabel("父母与小孩个数") 
plt.ylabel("人数")
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

转载于:https://blog.51cto.com/12631595/2391944

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值