pandas学习（三）——shift与apply

最新推荐文章于 2024-01-24 19:09:17 发布

Kylo_Cheok

最新推荐文章于 2024-01-24 19:09:17 发布

阅读量1.1k

点赞数

分类专栏： python-数据分析文章标签： python pandas

本文链接：https://blog.csdn.net/zy714816/article/details/83023882

版权

python-数据分析专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一.shift()

df.shift(periods=1, freq=None, axis=0)

df = pd.DataFrame(np.arange(1,17).reshape(4,4),columns=['A','B','C','D'],index =['a','b','c','d'])
print(df)
    A   B   C   D
a   1   2   3   4
b   5   6   7   8
c   9  10  11  12
d  13  14  15  16

print(df.shift(3))
     A    B    C    D
a  NaN  NaN  NaN  NaN
b  NaN  NaN  NaN  NaN
c  NaN  NaN  NaN  NaN
d  1.0  2.0  3.0  4.0

print(df.shift(2,axis = 1))
    A   B     C     D
a NaN NaN   1.0   2.0
b NaN NaN   5.0   6.0
c NaN NaN   9.0  10.0
d NaN NaN  13.0  14.0

print(df.shift(-1))
      A     B     C     D
a   5.0   6.0   7.0   8.0
b   9.0  10.0  11.0  12.0
c  13.0  14.0  15.0  16.0
d   NaN   NaN   NaN   NaN




df = pd.DataFrame(np.arange(1,17).reshape(4,4),columns=['A','B','C','D'],index =pd.date_range('10/1/2018','10/4/2018'))
print(df)
             A   B   C   D
2018-10-01   1   2   3   4
2018-10-02   5   6   7   8
2018-10-03   9  10  11  12
2018-10-04  13  14  15  16

print(df.shift(freq=datetime.timedelta(1)))
             A   B   C   D
2018-10-02   1   2   3   4
2018-10-03   5   6   7   8
2018-10-04   9  10  11  12
2018-10-05  13  14  15  16

print(df.shift(freq=datetime.timedelta(-1)))
             A   B   C   D
2018-09-30   1   2   3   4
2018-10-01   5   6   7   8
2018-10-02   9  10  11  12
2018-10-03  13  14  15  16

shift如字面义，移动，

函数中的几个参数意义如下：

period：表示移动的幅度，可以是正数，也可以是负数，默认值是1,1就表示移动一次，移动之后没有对应值的，就赋值为NaN。

freq： DateOffset, timedelta, or time rule string，可选参数，默认值为None，只适用于时间序列

axis：轴向。0表示行向移动（上下移动），1表示列向移动（左右移动）

period与freq的区别：

period移动时，只移动数据，行列索引不移动；

freq移动时，只移动索引，数据不变，且只在索引是时间时生效

period移动时的理解：

整个数据块移动，比如向下移动3行时，整个数据块向下移动，原本下三行就移出了我们规定的4*4的矩阵，所以原本下三行的数据就不可见了，而原本第一行上面是没有数据的，下移后依然为空，只是pandas里空数据一般以NaN占位

freq移动时的理解：

时间的移动自然是前一天后一天这样，所以，1就是后移一天，-1就是前移一天

二.map()，apply()，applymap()

def change(x):
     if x > 2:
         return 'big'
     return 'small'

s = pd.Series([1, 2, 3, np.nan])
frame = pd.DataFrame(np.arange(1, 17).reshape(4,4), columns=list('abcd'), index=['A', 'B', 'C', 'D'])
s2 = s.map(lambda x: change(x),na_action=None)
s3 = s.map(lambda x: change(x).format(x),na_action='ignore')
print(s2,s3)
0    small
1    small
2      big
3    small
dtype: object 

0    small
1    small
2      big
3      NaN
dtype: object

print(frame)
    a   b   c   d
A   1   2   3   4
B   5   6   7   8
C   9  10  11  12
D  13  14  15  16

print(frame.apply(lambda x:x.max()/x.min()))
a    13.0
b     7.0
c     5.0
d     4.0
dtype: float64

print(frame.applymap(change))
       a      b    c    d
A  small  small  big  big
B    big    big  big  big
C    big    big  big  big
D    big    big  big  big

Series.map(arg, na_action=None)

当na_action为None时，NaN会被传递到函数中，如果ignore，则直接传递Na值，而不将它传递到函数

Series.apply(func, convert_dtype=True, args=(), **kwds)

func：function
convert_dtype：boolean，default True尝试找到更好的dtype元素功能结果。如果为False，则保留为dtype = object
args：tuple除了值之外，还要传递位置参数