泰坦尼克号获救预测（针对anaconda中运行遇到的问题的解决办法）

最新推荐文章于 2023-01-03 22:40:30 发布

VIP文章 jiang cheng 828

最新推荐文章于 2023-01-03 22:40:30 发布

阅读量1k

点赞数 3

文章标签：针对anaconda中运行泰坦尼克号获救预测代码运行出问题的解决办法

本文链接：https://blog.csdn.net/kjcm123456/article/details/92079326

版权

机器学习实战（五）泰坦尼克号获救预测代码运行问题解决办法

一、读取数据，观察数据分布

import pandas #ipython notebook
titanic = pandas.read_csv("titanic_train.csv")
print(titanic.head(5))
#print (titanic.describe())#查看每一列的情况
#print(titanic.shape)#(891, 12)
#结果如下图：

在这里插入图片描述

1.分析：

survived：这一列，1-存活，0-死亡
sex：是文字形式，不利于分析，故可能需要映射到数值的值
age：这一列空缺了一百多个值，从逻辑上考虑年龄还是很重要的，所以缺失值需要填补
Ticket：这列船票号，看起来没规律。。。
Fare：船票费用和船舱等级（Pclass）以及航程长短（Embarked）有关。
Cabin：这个缺失值太多了，代表含义不清晰，先忽略。
Embarked：上船港口，有三个取值，C/S/Q，是文字形式，不利于分析，故可能需要映射到数值的值，而且有2个缺失值

二、数据预处理

1. 填充缺失值可以采取：平均值/中值/众数等填充方式。 Age这列平均值和中值都可以考虑一下（看具体效果决定），Embarked就缺了俩，而且取值就3个离散值，故用众数比较合理。

1.Age

titanic["Age"] = titanic["Age"].fillna(titanic["Age"].median())#数据填充（用均值）
print (titanic.describe())

2. Embarked

print(titanic['Embarked'].unique()) #取值可能的结果：['S' 'C' 'Q' nan]
print(titanic['Embarked'].mode())   #'众数'是s，那就用s
titanic['Embarked']=titanic['Embarked'].fillna('S')
print(titanic['Embarked'].describe())

‘’‘结果：
[‘S’ ‘C’ ‘Q’ nan]
0 S
dtype: object
count 891
unique 3
top S
freq 646
Name: Embarked, dtype: object’’’

2. 文字到数值的映射

（1）性别：male-0, female-1

print (titanic["Sex"].unique()) #（sex的可能性）
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0  #男标为0  #Replace all the occurences of male with the number 0.
titanic.loc[titanic["Sex"] == "female", "Sex"] = 1#女标为1
#结果：
#['male' 'female']

（2）港口：S-0, C-1, Q-2

print (titanic["Embarked"].unique())#（Embarked的可能性）
titanic.loc[titanic['Embarked']=='S','Embarked']=0
titanic.loc[titanic['Embarked']=='C','Embarked']=1
titanic.loc[titanic['Embarked']=='Q','Embarked']=2
print(titanic['Embarked'].describe())

结果：[‘S’ ‘C’ ‘Q’]
count 89

最低0.47元/天解锁文章

jiang cheng 828

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
泰坦尼克号获救预测（针对anaconda中运行遇到的问题的解决办法）

@机器学习实战（五） kaggle练习赛泰坦尼克获救预测一、读取数据，观察数据分布import pandas #ipython notebooktitanic = pandas.read_csv(“titanic_train.csv”)print(titanic.head(5))print (titanic.describe())#查看每一列的情况print(titanic.sha...
复制链接

扫一扫