PyPackage01---Pandas10_apply方法使用

最新推荐文章于 2022-11-27 15:17:46 发布

维格堂406小队

最新推荐文章于 2022-11-27 15:17:46 发布

阅读量223

点赞数

分类专栏： ★★★Python # ★★Python Package

本文链接：https://blog.csdn.net/wendaomudong_l2d4/article/details/107306738

版权

72 篇文章 1 订阅

订阅专栏

21 篇文章 1 订阅

订阅专栏

Intro

R里面apply族函数很强大，原来以为python的是阉割版，没想到也很强大，还是需要多看看文档。。。
相关环境和package信息：

import sys
import pandas as pd
import numpy as np 
print("Python版本：",sys.version)
print("pandas版本：",pd.__version__)
print("numpy版本：",np.__version__)

Python版本： 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
pandas版本： 0.23.4
numpy版本： 1.17.4

Parameters
func:function
Function to apply to each column or row.

axis:{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:

raw:bool, default False
Determines if row or column is passed as a Series or ndarray object:

False : passes each row or column as a Series to the function.
True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
这个参数不知道啥意思，似乎不影响使用

result_type:{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):

‘expand’ : list-like results will be turned into columns.
‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
New in version 0.23.0.
返回结果的形式，除了broadcast，其他应该类似

arg:stuple
Positional arguments to pass to func in addition to the array/series.

kwds:
Additional keyword arguments to pass as keywords arguments to func.

Returns
Series or DataFrame
Result of applying func along the given axis of the DataFrame.

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df

df.apply(np.sqrt)

行求和

df.apply(np.sum,axis=1)

0    13
1    13
2    13
dtype: int64

取出每一行中最大的元素

df.apply(np.max,axis=1)

0    9
1    9
2    9
dtype: int64

df.apply(np.sum,axis=0,result_type="expand")

A    12
B    27
dtype: int64

df.apply(np.sum,axis=0,result_type="broadcast")

还有些比较复杂的操作，比如对每一行中指定的某几列数据进行操作，这时，传入function即可，举个例子：

def test_f(row):
    return row["A"]+10-row["B"]

df.apply(test_f,axis=1)

0    5
1    5
2    5
dtype: int64

def test_f2(row):
    return [1,2,3,4]

df.apply(test_f2,axis=1)

0    [1, 2, 3, 4]
1    [1, 2, 3, 4]
2    [1, 2, 3, 4]
dtype: object

df.apply(test_f2,axis=1,result_type="expand")

df.apply(test_f2,axis=1,result_type="reduce")

0    [1, 2, 3, 4]
1    [1, 2, 3, 4]
2    [1, 2, 3, 4]
dtype: object

def test_f3(row):
    return [1,2]

df.apply(test_f3,axis=1,result_type="broadcast")

broadcast好像只能扩展同样长度的，即return的list长度=列数

2020-07-08 于南京市江宁区九龙湖

关注