pandas 运算

m0_58852358

于 2022-06-05 20:45:52 发布

阅读量172

点赞数

分类专栏： pandas 文章标签： python 开发语言

本文链接：https://blog.csdn.net/m0_58852358/article/details/125136064

版权

pandas 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

统计

一般情况下，运算时排除缺失值。

描述性统计：

In [61]: df.mean()
Out[61]: 
A   -0.004474
B   -0.383981
C   -0.687758
D    5.000000
F    3.000000
dtype: float64

在另一个轴(即，行)上执行同样的操作：

In [62]: df.mean(1)
Out[62]: 
2013-01-01    0.872735
2013-01-02    1.431621
2013-01-03    0.707731
2013-01-04    1.395042
2013-01-05    1.883656
2013-01-06    1.592306
Freq: D, dtype: float64

不同维度对象运算时，要先对齐。此外，Pandas 自动沿指定维度广播。

.shift() 整列向上或下移动。

df.sub(s, axis='index') 按S为索引，显示S索引的df数据，作为子数据显示。

In [63]: s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)

In [64]: s
Out[64]: 
2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64

In [65]: df.sub(s, axis='index')
Out[65]: 
                   A         B         C    D    F
2013-01-01       NaN       NaN       NaN  NaN  NaN
2013-01-02       NaN       NaN       NaN  NaN  NaN
2013-01-03 -1.861849 -3.104569 -1.494929  4.0  1.0
2013-01-04 -2.278445 -3.706771 -4.039575  2.0  0.0
2013-01-05 -5.424972 -4.432980 -4.723768  0.0 -1.0
2013-01-06       NaN       NaN       NaN  NaN  NaN

Apply 函数

Apply 处理函数：apply是沿DataFrame的轴应用功能，传递给函数的对象是Series对象，其索引为DataFrame的索引（axis = 0''）或DataFrame的列（axis = 1''）

np.cumsum(a, axis = 0)用于将数组按行累加； axis = 1用于将数组按列累加。

lambda x: 定义一个函数。因为有一些函数非常简单，并且只会在这一个地方被用到，所以单独拿出来 def f(x) 命名和定义显得非常啰嗦。所以lambda表达式也叫匿名函数。

In [66]: df.apply(np.cumsum)
Out[66]: 
                   A         B         C   D     F
2013-01-01  0.000000  0.000000 -1.509059   5   NaN
2013-01-02  1.212112 -0.173215 -1.389850  10   1.0
2013-01-03  0.350263 -2.277784 -1.884779  15   3.0
2013-01-04  1.071818 -2.984555 -2.924354  20   6.0
2013-01-05  0.646846 -2.417535 -2.648122  25  10.0
2013-01-06 -0.026844 -2.303886 -4.126549  30  15.0

In [67]: df.apply(lambda x: x.max() - x.min())
Out[67]: 
A    2.073961
B    2.671590
C    1.785291
D    0.000000
F    4.000000
dtype: float64

直方图

In [68]: s = pd.Series(np.random.randint(0, 7, size=10))

In [69]: s
Out[69]: 
0    4
1    2
2    1
3    2
4    6
5    4
6    4
7    6
8    4
9    4
dtype: int64

In [70]: s.value_counts()
Out[70]: 
4    5
6    2
2    2
1    1
dtype: int64

字符串方法

Series 的 str 属性包含一组字符串处理功能，如下列代码所示。注意，str 的模式匹配默认使用正则表达式。详见矢量字符串方法。

.str.lower() 全改为小写。

In [71]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [72]: s.str.lower()
Out[72]: 
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object