Pandas的数据转换函数map、apply、applymap

最新推荐文章于 2024-03-14 20:04:27 发布

Wangsh@

最新推荐文章于 2024-03-14 20:04:27 发布

阅读量653

点赞数

分类专栏： python笔记文章标签：数据分析数据挖掘

本文链接：https://blog.csdn.net/qq_48391148/article/details/124675064

版权

python笔记专栏收录该内容

29 篇文章 6 订阅

订阅专栏

1. map用于Series值的转换

isp中文映射，注意这里是小写

方法1：Series.map(dict）

方法2：Series.map(function)

2. apply用于Series和DataFrame的转换

Series.apply(function)

function的参数是Series的每个值

DataFrame.apply(function)

function的参数是对应轴的Series

3. applymap用于DataFrame所有值的转换

数据转换函数对比：map、apply、applymap：

map：只用于Series，实现每个值->值的映射；
apply：用于Series实现每个值的处理，用于Dataframe实现某个轴的Series的处理；
applymap：只能用于DataFrame，用于处理该DataFrame的每个元素；

1. map用于Series值的转换

实例：将股票代码英文转换成中文名字

Series.map(dict) or Series.map(function)均可

>>> import pandas as pd
>>> df = pd.read_csv('/lianxi/datas/read_test.csv')
>>> df.head()
         date   prov    isp    pv    uv
0  2020-04-26  hunan  cmnet  2000  1000
1  2020-04-26  hunan  cmnet  3000  1500
2  2020-04-26  hunan   cmcc  4000  1000
3  2020-04-26  hubei    ctc  2500  1000
4  2020-04-26  hubei   cmcc  2000  1000

#################################

isp中文映射，注意这里是小写

>>> dict_isp_names = {
...     "cmnet": "中国移动",
...     "cmcc": "中国联通",
...     "ctc": "中国电信"
... }

方法1：Series.map(dict）

>>> df['isp1']=df['isp'].map(dict_isp_names)
>>> df.head()
         date   prov    isp    pv    uv  isp1
0  2020-04-26  hunan  cmnet  2000  1000  中国移动
1  2020-04-26  hunan  cmnet  3000  1500  中国移动
2  2020-04-26  hunan   cmcc  4000  1000  中国联通
3  2020-04-26  hubei    ctc  2500  1000  中国电信
4  2020-04-26  hubei   cmcc  2000  1000  中国联通

#################################

方法2：Series.map(function)

function的参数是Series的每个元素的值

>>> df['isp2']=df['isp'].map(lambda x : dict_isp_names[x])
>>> df.head()
         date   prov    isp    pv    uv  isp1  isp2
0  2020-04-26  hunan  cmnet  2000  1000  中国移动  中国移动
1  2020-04-26  hunan  cmnet  3000  1500  中国移动  中国移动
2  2020-04-26  hunan   cmcc  4000  1000  中国联通  中国联通
3  2020-04-26  hubei    ctc  2500  1000  中国电信  中国电信
4  2020-04-26  hubei   cmcc  2000  1000  中国联通  中国联通

>>> df['float_column']=5.67435
>>> df.head()
         date   prov    isp    pv    uv  isp1  isp2  float_column
0  2020-04-26  hunan  cmnet  2000  1000  中国移动  中国移动       5.67435
1  2020-04-26  hunan  cmnet  3000  1500  中国移动  中国移动       5.67435
2  2020-04-26  hunan   cmcc  4000  1000  中国联通  中国联通       5.67435
3  2020-04-26  hubei    ctc  2500  1000  中国电信  中国电信       5.67435
4  2020-04-26  hubei   cmcc  2000  1000  中国联通  中国联通       5.67435

>>> df['float_column_1'] = df['float_column'].map(lambda x: '%.3f'%x)
>>> df.head()
         date   prov    isp    pv    uv  isp1  isp2  float_column float_column_1
0  2020-04-26  hunan  cmnet  2000  1000  中国移动  中国移动       5.67435          5.674
1  2020-04-26  hunan  cmnet  3000  1500  中国移动  中国移动       5.67435          5.674
2  2020-04-26  hunan   cmcc  4000  1000  中国联通  中国联通       5.67435          5.674
3  2020-04-26  hubei    ctc  2500  1000  中国电信  中国电信       5.67435          5.674
4  2020-04-26  hubei   cmcc  2000  1000  中国联通  中国联通       5.67435          5.674

#################################

2. apply用于Series和DataFrame的转换

Series.apply(function), 函数的参数是每个值
DataFrame.apply(function), 函数的参数是Series

Series.apply(function)

function的参数是Series的每个值

>>> df["isp3"] = df["isp"].apply(
...     lambda x : dict_isp_names[x])
>>> df.head()
         date   prov    isp    pv    uv  isp1  isp2  float_column float_column_1  isp3
0  2020-04-26  hunan  cmnet  2000  1000  中国移动  中国移动       5.67435          5.674  中国移动
1  2020-04-26  hunan  cmnet  3000  1500  中国移动  中国移动       5.67435          5.674  中国移动
2  2020-04-26  hunan   cmcc  4000  1000  中国联通  中国联通       5.67435          5.674  中国联通
3  2020-04-26  hubei    ctc  2500  1000  中国电信  中国电信       5.67435          5.674  中国电信
4  2020-04-26  hubei   cmcc  2000  1000  中国联通  中国联通       5.67435          5.674  中国联通
>>>

#################################

DataFrame.apply(function)

function的参数是对应轴的Series

>>> df['total'] = df[['pv','uv']].apply(lambda x : x.sum(), axis = 1)
>>> df.head()
         date   prov    isp    pv    uv  isp1  isp2  float_column float_column_1  isp3  total
0  2020-04-26  hunan  cmnet  2000  1000  中国移动  中国移动       5.67435          5.674  中国移动   3000
1  2020-04-26  hunan  cmnet  3000  1500  中国移动  中国移动       5.67435          5.674  中国移动   4500
2  2020-04-26  hunan   cmcc  4000  1000  中国联通  中国联通       5.67435          5.674  中国联通   5000
3  2020-04-26  hubei    ctc  2500  1000  中国电信  中国电信       5.67435          5.674  中国电信   3500
4  2020-04-26  hubei   cmcc  2000  1000  中国联通  中国联通       5.67435          5.674  中国联通   3000

注意这个代码：
1、apply是在df[['pv','uv']]这个DataFrame上调用；
2、lambda x的x是一个Series，axis=1表示跨列，axis=0 表示跨行

>>> df.loc['total'] = df[['pv','uv']].apply(lambda x : x.sum(), axis = 0)
>>> df.tail()
             date   prov   isp       pv       uv  isp1  isp2  float_column float_column_1  isp3   total
8      2020-04-27  hunan  cmcc   2800.0   1600.0  中国联通  中国联通       5.67435          5.674  中国联通  4400.0
9      2020-04-27  hubei   ctc   2600.0   1400.0  中国电信  中国电信       5.67435          5.674  中国电信  4000.0
10     2020-04-27  hubei  cmcc   3800.0   1900.0  中国联通  中国联通       5.67435          5.674  中国联通  5700.0
11     2020-04-27  hubei   ctc   2400.0   1900.0  中国电信  中国电信       5.67435          5.674  中国电信  4300.0
total         NaN    NaN   NaN  34400.0  17100.0   NaN   NaN           NaN            NaN   NaN     NaN
>>>

#################################

3. applymap用于DataFrame所有值的转换

>>> sub_df = df[['pv', 'uv']]
>>> sub_df.head()
       pv      uv
0  2000.0  1000.0
1  3000.0  1500.0
2  4000.0  1000.0
3  2500.0  1000.0
4  2000.0  1000.0
>>> sub_df = sub_df.applymap(lambda x : int(x))
>>> sub_df.head()
     pv    uv
0  2000  1000
1  3000  1500
2  4000  1000
3  2500  1000
4  2000  1000