理解pandas中的apply和map的作用和异同

最新推荐文章于 2024-01-26 14:55:20 发布

邓旭东HIT

最新推荐文章于 2024-01-26 14:55:20 发布

阅读量657

点赞数

本文链接：https://blog.csdn.net/weixin_38008864/article/details/103692404

版权

寒假工作坊

Python&Stata数据分析课寒假工作坊

现在开始招生了，有兴趣的同学和老师可以戳进来了解

课程安排

1月9-10日 Python爬虫&文本数据分析(模块Ⅰ）

1月11-16日 Stata 应用能力提升与实证前沿(模块Ⅱ)

地点

浙江 · 杭州（浙江工商大学）

pandas中的数据类型

我们可以将pandas中的数据简单的认为只有下面这两种

一维Series
二维DataFrame

这两种数据类型都有map和apply方法

Series: apply和map
DataFrame: apply

Series.map

Series.map(arg)

传入的arg : 操作函数、字典或Series
返回Series

import pandas as pd
import numpy as np
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
print(type(s))
s

<class 'pandas.core.series.Series'>
0       cat
1       dog
2       NaN
3    rabbit
dtype: object

s.map({'cat':'kitten',
       'dog':'puppy'})

0    kitten
1     puppy
2       NaN
3       NaN
dtype: object

Series.apply

Series.apply(func, args, **kwds)

func : 函数、字典或Series
args: func函数的顺序参数，元组类型
kwds: func函数的关键词参数，字典类型
返回Series或DataFrame

import pandas as pd
s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])
s

London      20
New York    21
Helsinki    12
dtype: int64

操作函数只有一个输入，一个输出

import pandas as pd
s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])
def func1(x):
    return x**2
s.apply(func1)

London      400
New York    441
Helsinki    144
dtype: int64

我们定义的操作函数，其输入和输出都是一个元素。这时候series.map和series.apply功效相同。

我们再看看操作函数输入一个，输出series（相当于多个元素）时，series的map和apply是否有区别

import pandas as pd
s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])
def func2(x):
    return pd.Series([x, x])
s.map(func2)

London      0    20
1    20
dtype: int64
New York    0    21
1    21
dtype: int64
Helsinki    0    12
1    12
dtype: int64
dtype: object

import pandas as pd
s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])
def func3(x):
    return pd.Series([x, x])
s.apply(func3)

	0	1
London	20	20
New York	21	21
Helsinki	12	12

DataFrame.apply()

apply(func, axis=0, args, kwds)

func: 操作函数
axis: 操作的方向，默认列方向axis=0。行方向操作axis=1
args: func的顺序参数，元组数据类型
kwds: func的关键词参数，字典数据类型

import pandas as pd
df = pd.DataFrame([[4, 9], [3, 2], [5, 7]], 
                  columns=['a', 'b'])
df

	a	b
0	4	9
1	3	2
2	5	7

def func4(row):
    return row[0]+row[1]
df.apply(func4, axis=1)

0    13
1     5
2    12
dtype: int64

def func5(x):
    return x+1
df.apply(func5, axis=1)

	a	b
0	5	10
1	4	3
2	6	8

df.apply(lambda x: x.max()+x.min())

a     8
b    11
dtype: int64

df.apply(lambda x: x.max()+x.min(), axis=1)

0    13
1     5
2    12
dtype: int64