Pandas基本常用操作自定义函数——第三节
以下是本文所用的数据表,如需数据表学习练习,请留言
#Pandas的自定义函数
import pandas as pd
import numpy as np
titanic = pd.read_csv('titanic_train.csv')
titanic.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
自定义函数要将函数名传入pandas提供的apply函数
#首先自定义一个函数 返回第二行的值
#Define a function which returns the value of the second row
def goal_value(column):
return column.loc[1]
titanic.apply(goal_value)
PassengerId 2
Survived 1
Pclass 1
Name Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex female
Age 38
SibSp 1
Parch 0
Ticket PC 17599
Fare 71.2833
Cabin C85
Embarked C
dtype: object
自定义一个函数返回每一列中缺失值的个数
#自定义一个函数返回每一列中缺失值的个数
#Define a function which returns the count of the nan of every column
def nan_count(column):
return len(column[pd.isnull(column)])
titanic.apply(nan_count)
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
自定义一个函数将数据表中的Pclass的1,2,3改为‘First class’‘Second class’ 'Third class’
#自定义一个函数将数据表中的Pclass的1,2,3改为‘First class’'Second class' 'Third class'
def transition(rows):
if pd.isnull(rows['Pclass']):
return 'Unknown class'
else:
if rows['Pclass'] == 1:
return 'First class'
elif rows['Pclass'] == 2:
return 'Second class'
else:
return 'Third class'
titanic.apply(transition,axis=1).head() #axis 通过行索引
0 Third class
1 First class
2 Third class
3 First class
4 Third class
dtype: object
自定义一个函数将游客分为adult和minor和unknown
#自定义一个函数将游客分为adult和minor和unknown
def is_adult(rows):
if pd.isnull(rows['Age']):
return 'Unknown'
else:
return 'Adult' if rows['Age'] >= 18 else 'Minor' #使用三目运算 在我博文控制流程中曾说到
titanic.apply(is_adult,axis=1).head()
0 Adult
1 Adult
2 Adult
3 Adult
4 Adult
dtype: object
如此可按照年龄将人群分类后分别计算三种人群的平均获救人数
#如此按照年龄将人群分类后分别计算三种人群的平均获救人数
age_labels = titanic.apply(is_adult,axis = 1)
titanic['Age_Label'] = age_labels
titanic.head(10) #可以看到已经将Age_Label加入到原表
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Age_Label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | Adult |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | Adult |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | Adult |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S | Adult |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S | Adult |
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q | Unknown |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S | Adult |
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S | Minor |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S | Adult |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C | Minor |
分别计算三种人群的平均获救人数
#分别计算三种人群的平均获救人数
titanic.pivot_table(values='Survived',index='Age_Label',aggfunc=np.average)
Survived | |
---|---|
Age_Label | |
Adult | 0.381032 |
Minor | 0.539823 |
Unknown | 0.293785 |