PyPackage01---Pandas10_apply方法使用

Intro

  R里面apply族函数很强大,原来以为python的是阉割版,没想到也很强大,还是需要多看看文档。。。
相关环境和package信息:

import sys
import pandas as pd
import numpy as np 
print("Python版本:",sys.version)
print("pandas版本:",pd.__version__)
print("numpy版本:",np.__version__)
Python版本: 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
pandas版本: 0.23.4
numpy版本: 1.17.4

参数说明

Parameters
func:function
Function to apply to each column or row.

axis:{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:

  • 0 or ‘index’: apply function to each column.对每一列进行操作
  • 1 or ‘columns’: apply function to each row.对每一行进行操作,明明是columns但是却是对行操作。。。

raw:bool, default False
Determines if row or column is passed as a Series or ndarray object:

  • False : passes each row or column as a Series to the function.
  • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
  • 这个参数不知道啥意思,似乎不影响使用

result_type:{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):

  • ‘expand’ : list-like results will be turned into columns.
  • ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
  • ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
    The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
    New in version 0.23.0.
    返回结果的形式,除了broadcast,其他应该类似

arg:stuple
Positional arguments to pass to func in addition to the array/series.

kwds:
Additional keyword arguments to pass as keywords arguments to func.

Returns
Series or DataFrame
Result of applying func along the given axis of the DataFrame.

对所有元素进行操作

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df
AB
049
149
249
df.apply(np.sqrt)
AB
02.03.0
12.03.0
22.03.0

行操作

行求和

df.apply(np.sum,axis=1)
0    13
1    13
2    13
dtype: int64

取出每一行中最大的元素

df.apply(np.max,axis=1)
0    9
1    9
2    9
dtype: int64

列操作

df.apply(np.sum,axis=0,result_type="expand")
A    12
B    27
dtype: int64
df.apply(np.sum,axis=0,result_type="broadcast")
AB
01227
11227
21227

其他复杂操作

还有些比较复杂的操作,比如对每一行中指定的某几列数据进行操作,这时,传入function即可,举个例子:

def test_f(row):
    return row["A"]+10-row["B"]
df.apply(test_f,axis=1)
0    5
1    5
2    5
dtype: int64
def test_f2(row):
    return [1,2,3,4]
df.apply(test_f2,axis=1)
0    [1, 2, 3, 4]
1    [1, 2, 3, 4]
2    [1, 2, 3, 4]
dtype: object
df.apply(test_f2,axis=1,result_type="expand")
0123
01234
11234
21234
df.apply(test_f2,axis=1,result_type="reduce")
0    [1, 2, 3, 4]
1    [1, 2, 3, 4]
2    [1, 2, 3, 4]
dtype: object
def test_f3(row):
    return [1,2]
df.apply(test_f3,axis=1,result_type="broadcast")
AB
012
112
212

broadcast好像只能扩展同样长度的,即return的list长度=列数

Ref

[1] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
[2] https://blog.csdn.net/qq_19528953/article/details/79348929?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159419042819724839247314%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=159419042819724839247314&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allfirst_rank_ecpm_v3~pc_rank_v3-2-79348929.pc_ecpm_v3_pc_rank_v3&utm_term=pandas+apply

                             2020-07-08 于南京市江宁区九龙湖

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值