pandas 三大利器 map, apply, applymap的通俗理解

最新推荐文章于 2024-04-07 15:15:37 发布

Xiaofei@IDO

最新推荐文章于 2024-04-07 15:15:37 发布

阅读量1k

点赞数 1

分类专栏： python知识点文章标签： python 数据挖掘开发语言

本文链接：https://blog.csdn.net/nixiang_888/article/details/123150256

版权

python知识点专栏收录该内容

27 篇文章 3 订阅

订阅专栏

前言

目前，处理数据的软件包括：R和python，还有一些其他的软件。当然，我也是经常在R和python之间切换使用。一般来说，但我进行数据的统计分析时候，我首选的是R程序，因为确实包含了太多的统计函数，还有需要贡献者。随着，python的粉丝的暴涨，越来越多的编程都或多或少使用了python，在编程语言的排行榜中，python也稳居前3。
在python的使用中，numpy，pandas是数据处理的利器。
接下来，我将阐述pandas常用的批量处理数据的三个函数，它能够有效的避免低效率（反正大家都这么说）for循环撰写，也使程序变的简单。

一、内建 map 函数

语法
内建map()函数基本语法：map(function, iterable[, iterable1, iterable2,..., iterableN])
基本语义
① function作用到iterable对象中的每一项，执行function的运算，返回一个新的map对象
② function的参数数量与iterable对象数量一致
③ function的迭代次数与iterable对象中元素数量最少的保持一致，不存在广播机制
④ function可以是built-in functions, classes, methods, lambda functions, and user-defined functions.

示例

# 单一iterable对象
>>> def square(number):
...     return number ** 2
...
# 自定义函数
>>> numbers = [1, 2, 3, 4, 5]
>>> squared = map(square, numbers)
>>> list(squared)
[1, 4, 9, 16, 25]
# 内建函数
>>> words = ["Welcome", "to", "Real", "Python"]
>>> list(map(len, words))
[7, 2, 4, 6]
# lambda 函数
>>> numbers = [1, 2, 3, 4, 5]
>>> squared = map(lambda num: num ** 2, numbers)
>>> list(squared)
[1, 4, 9, 16, 25]
# 对个 iterable 对象
>>> first_it = [1, 2, 3]
>>> second_it = [4, 5, 6, 7]
>>> list(map(pow, first_it, second_it))
[1, 32, 729]

>>> list(map(lambda x, y: x - y, [2, 4, 6], [1, 3, 5]))
[1, 1, 1]
>>> list(map(lambda x, y, z: x + y + z, [2, 4], [1, 3], [7, 8]))
[10, 15]

# Transforming Iterables of Strings With Python’s map()
>>> string_it = ["processing", "strings", "with", "map"]
>>> list(map(str.capitalize, string_it))
['Processing', 'Strings', 'With', 'Map']

>>> list(map(str.upper, string_it))
['PROCESSING', 'STRINGS', 'WITH', 'MAP']

>>> list(map(str.lower, string_it))
['processing', 'strings', 'with', 'map']

>>> with_spaces = ["processing ", "  strings", "with   ", " map   "]
>>> list(map(str.strip, with_spaces))
['processing', 'strings', 'with', 'map']

>>> with_dots = ["processing..", "...strings", "with....", "..map.."]
>>> list(map(lambda s: s.strip("."), with_dots))
['processing', 'strings', 'with', 'map']

二、pandas.series.map方法

语法
基本语义
① 根据 arg 的映射关系，将series中的每一个值替换为映射关系中对应的值
② arg映射关系，可以是function，dict 或者 series
③ na_action：处理缺失值的方法，如果设置为 ignore，则替换为缺失值，否则，则将缺失值作为字符串处理
④ 返回一个同索引的 series

示例

# 创建示例对象
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
s
0      cat
1      dog
2      NaN
3   rabbit
dtype: object

## 映射关系为：dict 或者 series
# map accepts a dict or a Series. 
# Values that are not found in the dict are converted to NaN, 
# unless the dict has a default value (e.g. defaultdict):
# 也就是说，map接受一个dict或者series，在进行映射时。
# 如果dict或series中不存在映射时，将转换为缺失值；
# 可以理解为series中的每一个值是映射关系的键值，键值不存在，将变为缺失值。
s.map({'cat': 'kitten', 'dog': 'puppy'})
0   kitten
1    puppy
2      NaN
3      NaN
dtype: object

## 映射关系：function
# 如果映射关系是function，此时series中的每一个元素都是function的一个参数
s.map('I am a {}'.format)
0       I am a cat
1       I am a dog
2       I am a nan
3    I am a rabbit
dtype: object

# 设置na_action
s.map('I am a {}'.format, na_action='ignore')
0     I am a cat
1     I am a dog
2            NaN
3  I am a rabbit
dtype: object

三、pandas.series.apply方法

语法
基本语义
① function 可以是：自定义的python函数，或者来自于numpy模块的内建函数
② 当函数为自定义的python函数时，该函数作用在series中的每一个元素上，当函数为numpy的内置函数时，将作用在整个series对象上
③ convert_dtype参数，用于自动抉择最佳的数据类型，来存贮数据
④ args参数，一个为元祖的位置参数，它传递给函数的第一个参数（是series的元素）之后的参数
⑤ **kwargs参数，是关键字参数，可在函数中使用 kwargs 对象

示例

# 创建数据
s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])
s
London      20
New York    21
Helsinki    12
dtype: int64

# 自定义函数，只有一个参数时，是接受series对象中的元素
def square(x):
    return x ** 2
s.apply(square)
London      400
New York    441
Helsinki    144
dtype: int64
# lambda 函数
s.apply(lambda x: x ** 2)
London      400
New York    441
Helsinki    144
dtype: int64
# arg参数测试
def subtract_custom_value(x, custom_value):
    return x - custom_value
s.apply(subtract_custom_value, args=(5,))
London      15
New York    16
Helsinki     7
dtype: int64
# kwargs参数测试
def add_custom_values(x, **kwargs):
    for month in kwargs:
        x += kwargs[month] # 自定义
    return x
s.apply(add_custom_values, june=30, july=20, august=25)
London      95
New York    96
Helsinki    87
dtype: int64
# numpy方法
s.apply(np.log)
London      2.995732
New York    3.044522
Helsinki    2.484907
dtype: float64

四、pandas.DataFrame.apply方法

语法
基本语义
① 将函数应用于DataFrame上的每一行（1）或者列（0）
② axis参数：0-将函数应用于每一列上；1-将函数应用于每一行上
③ raw参数：默认（False）将从DataFrame中提取的数据，作为一个series，传递给函数
④ result_type参数：expand-返回类列表对象；broadcast-返回同操作的DataFrame形状相同的DataFrame，同时保留了原始DataFrame的行列索引
⑤ args参数：元祖参数
⑥ **kwargs参数：关键字参数

示例

# 创建数据
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df
   A  B
0  4  9
1  4  9
2  4  9
# 使用numpy函数
df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

# 按列求和
df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64

# 按行求和
df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

五、pandas.DataFrame.applymap方法

语法
基本语义
① 将函数应用于DataFrame中的每一个元素上
② 参数同pd.series.map中的含义

示例

df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
df
       0      1
0  1.000  2.120
1  3.356  4.567

df.applymap(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5

# Like Series.map, NA values can be ignored:
df_copy = df.copy()
df_copy.iloc[0, 0] = pd.NA
df_copy.applymap(lambda x: len(str(x)), na_action='ignore')
      0  1
0  <NA>  4
1     5  5

# Note that a vectorized version of func often exists, 
# which will be much faster. You could square each number elementwise.
df.applymap(lambda x: x**2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

# But it’s better to avoid applymap in that case.
df ** 2
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

Xiaofei@IDO

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
pandas 三大利器 map, apply, applymap的通俗理解

前言目前，处理数据的软件包括：R和python，还有一些其他的软件。当然，我也是经常在R和python之间切换使用。一般来说，但我进行数据的统计分析时候，我首选的是R程序，因为确实包含了太多的统计函数，还有需要贡献者。随着，python的粉丝的暴涨，越来越多的编程都或多或少使用了python，在编程语言的排行榜中，python也稳居前3。在python的使用中，numpy，pandas是数据处理的利器。接下来，我将阐述pandas常用的批量处理数据的三个函数，它能够有效的避免低效率（反正大家都这么说）
复制链接

扫一扫