Hands-on data analysis Task03

import numpy as np
import pandas as pd
df1=pd.read_csv(r"C:\Jupyter.Data\titanic\hands-on-data-analysis-master\data.charpter2\train-left-down.csv")
df2=pd.read_csv(r"C:\Jupyter.Data\titanic\hands-on-data-analysis-master\data.charpter2\train-left-up.csv")
df3=pd.read_csv(r"C:\Jupyter.Data\titanic\hands-on-data-analysis-master\data.charpter2\train-right-down.csv")
df4=pd.read_csv(r"C:\Jupyter.Data\titanic\hands-on-data-analysis-master\data.charpter2\train-right-up.csv")

cancat 方法实现

df1.head()
PassengerIdSurvivedPclassName
044002Kvillner, Mr. Johan Henrik Johannesson
144112Hart, Mrs. Benjamin (Esther Ada Bloomfield)
244203Hampe, Mr. Leon
344303Petterson, Mr. Johan Emil
444412Reynaldo, Ms. Encarnacion
df2.head()
PassengerIdSurvivedPclassName
0103Braund, Mr. Owen Harris
1211Cumings, Mrs. John Bradley (Florence Briggs Th...
2313Heikkinen, Miss. Laina
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)
4503Allen, Mr. William Henry
df3.head()
SexAgeSibSpParchTicketFareCabinEmbarked
0male31.000C.A. 1872310.500NaNS
1female45.011F.C.C. 1352926.250NaNS
2male20.0003457699.500NaNS
3male25.0103470767.775NaNS
4female28.00023043413.000NaNS
df4.head()
SexAgeSibSpParchTicketFareCabinEmbarked
0male22.010A/5 211717.2500NaNS
1female38.010PC 1759971.2833C85C
2female26.000STON/O2. 31012827.9250NaNS
3female35.01011380353.1000C123S
4male35.0003734508.0500NaNS
df_result_up = pd.concat([df2,df4],axis=1)
df_result_up.to_csv("result_up.csv")
df_result_down = pd.concat([df1,df3],axis=1)
df_result_down.to_csv("result_down.csv")
df = pd.concat([df_result_down,df_result_up],axis=0)
df.to_csv("result.csv")
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
044002Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.500NaNS
144112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.011F.C.C. 1352926.250NaNS
244203Hampe, Mr. Leonmale20.0003457699.500NaNS
344303Petterson, Mr. Johan Emilmale25.0103470767.775NaNS
444412Reynaldo, Ms. Encarnacionfemale28.00023043413.000NaNS
df.shape
(891, 12)

append 和 join方法实现

df_down = df1.join(df3)
df_up = df2.join(df4)
df = df_down.append(df_up)
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
044002Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.500NaNS
144112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.011F.C.C. 1352926.250NaNS
244203Hampe, Mr. Leonmale20.0003457699.500NaNS
344303Petterson, Mr. Johan Emilmale25.0103470767.775NaNS
444412Reynaldo, Ms. Encarnacionfemale28.00023043413.000NaNS

merge和append方法实现

df_down = pd.merge(df1,df3,left_index=True,right_index=True)
df_up = pd.merge(df2,df4,left_index=True,right_index=True)
df= df_up.append(df_down)
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS

【思考】单独使用不能完成,接不上。

df.to_csv("result.csv")

将数据转为Series类型

df.stack().head(20) #stack就是将每条数据串起来
0  PassengerId                                                    1
   Survived                                                       0
   Pclass                                                         3
   Name                                     Braund, Mr. Owen Harris
   Sex                                                         male
   Age                                                           22
   SibSp                                                          1
   Parch                                                          0
   Ticket                                                 A/5 21171
   Fare                                                        7.25
   Embarked                                                       S
1  PassengerId                                                    2
   Survived                                                       1
   Pclass                                                         1
   Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
   Sex                                                       female
   Age                                                           38
   SibSp                                                          1
   Parch                                                          0
   Ticket                                                  PC 17599
dtype: object
df.stack().to_csv("stack.csv")
stack = pd.read_csv("stack.csv")
stack.head()
0PassengerId1
00Survived0
10Pclass3
20NameBraund, Mr. Owen Harris
30Sexmale
40Age22.0

数据聚合与运算

Groupby类似于excel数据透视表和分类汇总,只是功能更加强大了些。
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
df.groupby("Sex")["Fare"].mean()
Sex
female    44.479818
male      25.523893
Name: Fare, dtype: float64
df.groupby("Sex")["Survived"].sum()
Sex
female    233
male      109
Name: Survived, dtype: int64
df.groupby("Embarked")["Survived"].sum()
Embarked
C     93
Q     30
S    217
Name: Survived, dtype: int64
结论:(1)女性平均票价高于男性;(2)女性存活人数高于男性;(3)S登记仓位存活人数远高于其他舱位。
Table = df.groupby("Sex").agg({"Fare":["mean"],"Survived":["sum"]})
Table = Table.rename(columns={"Fare":"Fare_mean","Survived":"Survived_mean"})
Table
Fare_meanSurvived_mean
meansum
Sex
female44.479818233
male25.523893109
df.groupby(['Pclass','Age'])['Fare'].mean().head()
Pclass  Age  
1       0.92     151.5500
        2.00     151.5500
        4.00      81.8583
        11.00    120.0000
        14.00    120.0000
Name: Fare, dtype: float64
df_sex_fare = df.groupby("Sex")["Fare"].mean()
df_sex_survived = df.groupby("Sex")["Survived"].sum()
df_fare_survived = pd.merge(df_sex_fare.reset_index(),df_sex_survived.reset_index(),on="Sex") 

#不reset_index会报错,原因未知。
df_fare_survived.head()
SexFareSurvived
0female44.479818233
1male25.523893109
df_fare_survived.to_csv("Fare_survived.csv")

存活率最大的年龄段

df.describe()
PassengerIdSurvivedPclassAgeSibSpParchFare
count891.000000891.000000891.000000714.000000891.000000891.000000891.000000
mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208
std257.3538420.4865920.83607114.5264971.1027430.80605749.693429
min1.0000000.0000001.0000000.4200000.0000000.0000000.000000
25%223.5000000.0000002.00000020.1250000.0000000.0000007.910400
50%446.0000000.0000003.00000028.0000000.0000000.00000014.454200
75%668.5000001.0000003.00000038.0000001.0000000.00000031.000000
max891.0000001.0000003.00000080.0000008.0000006.000000512.329200
df['AgeBand'] = pd.cut(df['Age'],[0,5,15,30,55,80],labels = ["Baby","Younth","Adult","Middle-age","Old-man"])
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBand
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNSAdult
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85CMiddle-age
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNSAdult
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123SMiddle-age
4503Allen, Mr. William Henrymale35.0003734508.0500NaNSMiddle-age
df_rate = df.groupby("AgeBand")["Survived"].agg({"sum","count",lambda x:x.sum()/x.count()})
df_rate.columns = df_rate.columns.map(''.join).str.replace('<lambda>','survived_rate') 
df_rate
sumcountsurvived_rate
AgeBand
Baby31440.704545
Younth18390.461538
Adult1173260.358896
Middle-age1122650.422642
Old-man12400.300000
df_rate["survived_rate"].sort_values(ascending=False)
max_group = df_rate.index[0]
max_rate = df_rate.survived_rate[0]
print("存活率最高的组为:%s;存活率为:%.2f%%"%(max_group,100*max_rate))
存活率最高的组为:Baby;存活率为:70.45%
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值