pandas apply函数

最新推荐文章于 2024-06-30 10:22:14 发布

马行处

最新推荐文章于 2024-06-30 10:22:14 发布

阅读量1.7k

点赞数

分类专栏：编程语言文章标签： apply

本文链接：https://blog.csdn.net/qq_37928340/article/details/88756456

版权

编程语言专栏收录该内容

16 篇文章 2 订阅

订阅专栏

源码：
def apply(self, func, axis=0, broadcast=False, raw=False, reduce=None,
          args=(), **kwds):

沿DataFrame的输入轴应用功能。

         传递给函数的对象是具有索引的Series对象
         DataFrame的索引（轴= 0）或列（轴= 1）。
         返回类型取决于是否传递函数聚合，或者
         如果DataFrame为空，则减少参数。

Parameters
----------
func : function
    Function to apply to each column/row
axis : {0 or 'index', 1 or 'columns'}, default 0
    * 0 or 'index': apply function to each column
    * 1 or 'columns': apply function to each row
broadcast : boolean, default False
    For aggregation functions, return object of same size with values
    propagated
raw : boolean, default False
    If False, convert each row or column into a Series. If raw=True the
    passed function will receive ndarray objects instead. If you are
    just applying a NumPy reduction function this will achieve much
    better performance
reduce : boolean or None, default None
    Try to apply reduction procedures. If the DataFrame is empty,
    apply will use reduce to determine whether the result should be a
    Series or a DataFrame. If reduce is None (the default), apply's
    return value will be guessed by calling func an empty Series (note:
    while guessing, exceptions raised by func will be ignored). If
    reduce is True a Series will always be returned, and if False a
    DataFrame will always be returned.
args : tuple
    Positional arguments to pass to function in addition to the
    array/series
Additional keyword arguments will be passed as keywords to the function

Notes
-----
In the current implementation apply calls func twice on the
first column/row to decide whether it can take a fast or slow
code path. This can lead to unexpected behavior if func has
side-effects, as they will take effect twice for the first
column/row.

Examples
--------
>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

See also
--------
DataFrame.applymap: For elementwise operations

Returns
-------
applied : Series or DataFrame
"""

实例：

数据

Yr Mo Dy   RPT   VAL   ROS   KIL   SHA   BIR   DUB   CLA   MUL   CLO   BEL   MAL
61  1  1 15.04 14.96 13.17  9.29   NaN  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  1  2 14.71   NaN 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25   NaN  8.50  7.67 12.75 12.71
61  1  4 10.58  6.63 11.75  4.58  4.54  2.88  8.63  1.79  5.83  5.88  5.46 10.88
61  1  5 13.33 13.25 11.42  6.17 10.71  8.21 11.92  6.54 10.92 10.34 12.92 11.83
61  1  6 13.21  8.12  9.96  6.67  5.37  4.50 10.67  4.42  7.17  7.50  8.12 13.17
61  1  7 13.50 14.29  9.50  4.96 12.29  8.33  9.17  9.29  7.58  7.96 13.96 13.79

wind = pd.read_csv('../wind.csv', sep='\s+', parse_dates=[[0, 1, 2]])

print(wind.apply(np.sqrt))
print(wind.apply(np.sum,axis=0))
print(wind.apply(np.sum,axis=1))

RPT     81200.10
VAL     69943.79
ROS     76632.98
KIL     41427.19
SHA     68715.74
BIR     46624.48
DUB     64378.34
CLA     55829.49
MUL     55811.38
CLO     57233.29
BEL     86257.50
MAL    102485.95
dtype: float64

Yr_Mo_Dy
1961-01-01    143.20
1961-01-02    124.70
1961-01-03    128.06
1961-01-04     79.43
1961-01-05    127.56
1961-01-06     98.88
1961-01-07    124.62

def plus(df,n,m):
    df['c'] = (df['a']+df['b'])*m
    df['d'] = (df['a']+df['b'])*n
    return df

list_plus = [[1,3],[7,8],[4,5]]

df1 = pd.DataFrame(list_plus,columns=['a','b'])
print(df1)
df1 = df1.apply(plus,axis=1,args=(3,2))
print(df1)