练习7-可视化
探索泰坦尼克灾难数据
步骤1 导入必要的库
运行以下代码
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%matplotlib inline
步骤2 从以下地址导入数据
运行以下代码
path7 = ‘…/input/pandas_exercise/pandas_exercise/exercise_data/train.csv’ # train.csv
步骤3 将数据框命名为titanic
运行以下代码
titanic = pd.read_csv(path7)
titanic.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
步骤4 将PassengerId设置为索引
运行以下代码
titanic.set_index(‘PassengerId’).head()
Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
PassengerId
1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
步骤5 绘制一个展示男女乘客比例的扇形图
运行以下代码
sum the instances of males and females
males = (titanic[‘Sex’] == ‘male’).sum()
females = (titanic[‘Sex’] == ‘female’).sum()
put them into a list called proportions
proportions = [males, females]
Create a pie chart
plt.pie(
# using proportions
proportions,
# with the labels being officer names
labels = ['Males', 'Females'],
# with no shadows
shadow = False,
# with colors
colors = ['blue','red'],
# with one slide exploded out
explode = (0.15 , 0),
# with the start angle at 90%
startangle = 90,
# with the percent listed as a fraction
autopct = '%1.1f%%'
)
View the plot drop above
plt.axis(‘equal’)
Set labels
plt.title(“Sex Proportion”)
View the plot
plt.tight_layout()
plt.show()
步骤6 绘制一个展示船票Fare, 与乘客年龄和性别的散点图
运行以下代码
creates the plot using
lm = sns.lmplot(x = ‘Age’, y = ‘Fare’, data = titanic, hue = ‘Sex’, fit_reg=False)
set title
lm.set(title = ‘Fare x Age’)
get the axes object and tweak it
axes = lm.axes
axes[0,0].set_ylim(-5,)
axes[0,0].set_xlim(-5,85)
(-5, 85)
步骤7 有多少人生还?
运行以下代码
titanic.Survived.sum()
342
步骤8 绘制一个展示船票价格的直方图
运行以下代码
sort the values from the top to the least value and slice the first 5 items
df = titanic.Fare.sort_values(ascending = False)
df
create bins interval using numpy
binsVal = np.arange(0,600,10)
binsVal
create the plot
plt.hist(df, bins = binsVal)
Set the title and labels
plt.xlabel(‘Fare’)
plt.ylabel(‘Frequency’)
plt.title(‘Fare Payed Histrogram’)
show the plot
plt.show()