Pandas基本常用操作自定义函数——第三节

Pandas基本常用操作自定义函数——第三节

以下是本文所用的数据表,如需数据表学习练习,请留言
在这里插入图片描述

#Pandas的自定义函数
import pandas as pd
import numpy as np
titanic = pd.read_csv('titanic_train.csv')
titanic.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS

自定义函数要将函数名传入pandas提供的apply函数

#首先自定义一个函数  返回第二行的值
#Define a function which returns the value of the second row 
def goal_value(column):
    return column.loc[1]
titanic.apply(goal_value)
PassengerId                                                    2
Survived                                                       1
Pclass                                                         1
Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                       female
Age                                                           38
SibSp                                                          1
Parch                                                          0
Ticket                                                  PC 17599
Fare                                                     71.2833
Cabin                                                        C85
Embarked                                                       C
dtype: object

自定义一个函数返回每一列中缺失值的个数

#自定义一个函数返回每一列中缺失值的个数
#Define a function which returns the count of the nan of every column
def nan_count(column):
    return len(column[pd.isnull(column)])
titanic.apply(nan_count)
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

自定义一个函数将数据表中的Pclass的1,2,3改为‘First class’‘Second class’ 'Third class’

#自定义一个函数将数据表中的Pclass的1,2,3改为‘First class’'Second class' 'Third class'
def transition(rows):
    if pd.isnull(rows['Pclass']):
        return 'Unknown class'
    else:
        if rows['Pclass'] == 1:
            return 'First class'
        elif rows['Pclass'] == 2:
            return 'Second class'
        else:
            return 'Third class'  
titanic.apply(transition,axis=1).head()  #axis  通过行索引
0    Third class
1    First class
2    Third class
3    First class
4    Third class
dtype: object

自定义一个函数将游客分为adult和minor和unknown

#自定义一个函数将游客分为adult和minor和unknown
def is_adult(rows):
    if pd.isnull(rows['Age']):
        return 'Unknown'
    else:
        return 'Adult' if rows['Age'] >= 18 else 'Minor'  #使用三目运算  在我博文控制流程中曾说到
titanic.apply(is_adult,axis=1).head()
0    Adult
1    Adult
2    Adult
3    Adult
4    Adult
dtype: object

如此可按照年龄将人群分类后分别计算三种人群的平均获救人数

#如此按照年龄将人群分类后分别计算三种人群的平均获救人数
age_labels = titanic.apply(is_adult,axis = 1)
titanic['Age_Label'] = age_labels
titanic.head(10) #可以看到已经将Age_Label加入到原表
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAge_Label
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNSAdult
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85CAdult
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNSAdult
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123SAdult
4503Allen, Mr. William Henrymale35.0003734508.0500NaNSAdult
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQUnknown
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46SAdult
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNSMinor
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNSAdult
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNCMinor

分别计算三种人群的平均获救人数

#分别计算三种人群的平均获救人数
titanic.pivot_table(values='Survived',index='Age_Label',aggfunc=np.average)
Survived
Age_Label
Adult0.381032
Minor0.539823
Unknown0.293785
  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值