数据分析之三–数据重构
数据分析
一、数据合并
首先读入数据,并将几个CSV进行拼接
1.1 导入数据
text_left_up = pd.read_csv("data/train-left-up.csv")
text_left_down = pd.read_csv("data/train-left-down.csv")
text_right_up = pd.read_csv("data/train-right-up.csv")
text_right_down = pd.read_csv("data/train-right-down.csv")
1.2使用connect方法拼接csv
横向拼接csv
list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)
result_up.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S
纵相拼接csv
result = pd.concat([result_up,result_down])
result.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S
1.3使用join和append方法拼接csv
resul_up = text_left_up.join(text_right_up)
result_down = text_left_down.join(text_right_down)
result = result_up.append(result_down)
result.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S
1.4使用merge与append方法拼接csv
result_up = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True)
result_down = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True)
result = resul_up.append(result_down)
result.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S
二、数据集合与运算
2.1计算平均票价
df = text['Fare'].groupby(text['Sex'])
means = df.mean()
means
Sex
female 44.479818
male 25.523893
Name: Fare, dtype: float64
2.2统计存活人数
survived_sex = text['Survived'].groupby(text['Sex']).sum()
survived_sex.head()
Sex
female 233
male 109
Name: Survived, dtype: int64
2.3计算不同客舱等级的存活人数
survived_pclass = text['Survived'].groupby(text['Pclass'])
survived_pclass.sum()
Pclass
1 136
2 87
3 119
Name: Survived, dtype: int64