小结 pandas 时间序列

目录:
在这里插入图片描述---------------------------------------------

Timestamp

import pandas as pd

pd.Timestamp(2020, 4, 1)
pd.Timestamp(2020, 4, 1, 0, 0, 10)

from datetime import datetime
pd.Timestamp(datetime(2020, 4, 1))

Period

pd.Period('2020-4')  # default  'M'

pd.Period('2020-4', freq='D')

以时间为元素的Series

dates = ['2020-4-1', '2020-4-2', '2020-4-3']
pd.to_datetime(dates)

seri = pd.Series(['2020-4-1', '2020-4-2'])
pd.to_datetime(seri)

df = pd.DataFrame({'year': [2020, 2021], 'month': [4, 5], 'day': [1, 2], 'hour': [10, 10]})
pd.to_datetime(df)

DatetimeIndex

dates = ['2020-4-1', '2020-4-2', '2020-4-3']
pd.DatetimeIndex(dates)

pd.date_range('2020-4-1', '2020-4-3', freq='D')  # freq='M' 月末

pd.bdate_range('2020-4-1', periods=100)  # 工作日

pd.period_range('2020-4-1', periods=100)

以时间为索引的Series

import numpy as np

dates = [pd.Timestamp(2020, 4, 1), pd.Timestamp(2020, 4, 2), pd.Timestamp(2020, 4, 3)]
series = pd.Series(np.random.randn(len(dates)), dates)

dates = [pd.Period('2020-4'), pd.Period('2020-5'), pd.Period('2020-6')]
series = pd.Series(np.random.randn(len(dates)), dates)

prng = pd.period_range('2020Q1', '2022Q4', freq='Q-NOV')
ps = pd.DataFrame(np.random.randn(len(prng)), columns=['A'], index=prng)

时间索引对象处理

# 处理
ts = pd.Series(np.random.randn(100), pd.date_range('2020-4-1', periods=100, freq='D'))

# 查找
ts[:5]
ts[::2]
ts['2020-7-2']
ts[[1, 3, 5]]
ts['2020-4']  # 4月
ts.truncate(before='2020-4-1', after='2020-4-10')   # 切片
ts['2020-4-1' : '2020-4-10']

# 移动 shift
ts.shift(1)  # 数据向下移动1位
ts.shift(1, freq='D')  # 索引向上移动1位  -- 试试freq='W'

# 重采样 resample
# 下采样,增大时间间隔,减少记录数量;减小时间颗粒度
ts.resample('W').sum()  # 周
ts.resample('M').sum()  # 月
ts.resample('W').mean() 
ts.resample('W').ohlc()  # 对所有未被采样值进行统计

# 上采样,减小时间间隔频率,增加记录数量; 增大时间颗粒度
ts.resample('12H').asfreq()
ts.resample('12H').ffill()

时间计算

pandas 内的时间类

常用于时间的索引位移。
在这里插入图片描述

from pandas.tseries.offsets import DateOffset
d = pd.Timestamp(2020, 4, 1, 0, 0, 10)

d
Out[49]: Timestamp('2020-04-01 00:00:10')

d + DateOffset()
Out[50]: Timestamp('2020-04-02 00:00:10')

d + DateOffset(months=1, days=1)
Out[53]: Timestamp('2020-05-02 00:00:10')


from pandas.tseries.offsets import BDay
d + BDay()
Out[52]: Timestamp('2020-04-02 00:00:10')

d + 10 * BDay()
Out[54]: Timestamp('2020-04-15 00:00:10')

d + BMonthEnd() * 2
Out[57]: Timestamp('2020-05-29 00:00:10')


from pandas.tseries.offsets import BYearEnd
d + BYearEnd()
Out[66]: Timestamp('2020-12-31 00:00:10')

d + BYearEnd() * 2
Out[67]: Timestamp('2021-12-31 00:00:10')

d + BYearEnd(month=1)
Out[71]: Timestamp('2021-01-29 00:00:10')



from pandas.tseries.offsets import Week
d - Week()
Out[73]: Timestamp('2020-03-25 00:00:10')

d - Week(weekday=3)  # 移动到上周四
Out[76]: Timestamp('2020-03-26 00:00:10')


from pandas.tseries.offsets import Minute
d + Minute(10)
Out[104]: Timestamp('2020-04-01 00:10:10')

常用时间频率参数

参数名说明
B工作日频率
C定制工作日频率
D日历日频率
W每周频率
M月结束频率
SM半月结束频率(15 个月和月末)
BM业务月末频率
CBM定制业务月末频率
MS月起始频率
sMs半月起始频率(第 1 和 15)
BMS业务月开始频率
CBMS定制商业月份开始频率
Q四分频结束频率
BQ业务四分之一频率
QS四分频启动频率
BQS业务季开始频率
A年结束频率
BA业务年结束频率
AS年起始频率
BAS业务年开始频率
BH工作时间频率
H每小时频率
T, min分钟频率
S次频
L, ms毫秒
U, uS微秒
N纳秒


pd.date_range('2020-4-1', periods=10, freq='B')
Out[106]: 
DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-06',
               '2020-04-07', '2020-04-08', '2020-04-09', '2020-04-10',
               '2020-04-13', '2020-04-14'],
              dtype='datetime64[ns]', freq='B')

pd.date_range('2020-4-1', periods=10, freq='D')
Out[108]: 
DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
               '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
               '2020-04-09', '2020-04-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range('2020-4-1', periods=10, freq='W')
Out[109]: 
DatetimeIndex(['2020-04-05', '2020-04-12', '2020-04-19', '2020-04-26',
               '2020-05-03', '2020-05-10', '2020-05-17', '2020-05-24',
               '2020-05-31', '2020-06-07'],
              dtype='datetime64[ns]', freq='W-SUN')

pd.date_range('2020-4-1', periods=10, freq='M')
Out[110]: 
DatetimeIndex(['2020-04-30', '2020-05-31', '2020-06-30', '2020-07-31',
               '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30',
               '2020-12-31', '2021-01-31'],
              dtype='datetime64[ns]', freq='M')

pd.date_range('2020-4-1', periods=10, freq='SM')
Out[111]: 
DatetimeIndex(['2020-04-15', '2020-04-30', '2020-05-15', '2020-05-31',
               '2020-06-15', '2020-06-30', '2020-07-15', '2020-07-31',
               '2020-08-15', '2020-08-31'],
              dtype='datetime64[ns]', freq='SM-15')

pd.date_range('2020-4-1', periods=10, freq='BM')
Out[112]: 
DatetimeIndex(['2020-04-30', '2020-05-29', '2020-06-30', '2020-07-31',
               '2020-08-31', '2020-09-30', '2020-10-30', '2020-11-30',
               '2020-12-31', '2021-01-29'],
              dtype='datetime64[ns]', freq='BM')

pd.date_range('2020-4-1', periods=10, freq='MS')
Out[113]: 
DatetimeIndex(['2020-04-01', '2020-05-01', '2020-06-01', '2020-07-01',
               '2020-08-01', '2020-09-01', '2020-10-01', '2020-11-01',
               '2020-12-01', '2021-01-01'],
              dtype='datetime64[ns]', freq='MS')

pd.date_range('2020-4-1', periods=10, freq='Q')
Out[114]: 
DatetimeIndex(['2020-06-30', '2020-09-30', '2020-12-31', '2021-03-31',
               '2021-06-30', '2021-09-30', '2021-12-31', '2022-03-31',
               '2022-06-30', '2022-09-30'],
              dtype='datetime64[ns]', freq='Q-DEC')

pd.date_range('2020-4-1', periods=10, freq='QS')
Out[115]: 
DatetimeIndex(['2020-04-01', '2020-07-01', '2020-10-01', '2021-01-01',
               '2021-04-01', '2021-07-01', '2021-10-01', '2022-01-01',
               '2022-04-01', '2022-07-01'],
              dtype='datetime64[ns]', freq='QS-JAN')

pd.date_range('2020-4-1', periods=10, freq='BQ')
Out[116]: 
DatetimeIndex(['2020-06-30', '2020-09-30', '2020-12-31', '2021-03-31',
               '2021-06-30', '2021-09-30', '2021-12-31', '2022-03-31',
               '2022-06-30', '2022-09-30'],
              dtype='datetime64[ns]', freq='BQ-DEC')

pd.date_range('2020-4-1', periods=10, freq='BH')
Out[117]: 
DatetimeIndex(['2020-04-01 09:00:00', '2020-04-01 10:00:00',
               '2020-04-01 11:00:00', '2020-04-01 12:00:00',
               '2020-04-01 13:00:00', '2020-04-01 14:00:00',
               '2020-04-01 15:00:00', '2020-04-01 16:00:00',
               '2020-04-02 09:00:00', '2020-04-02 10:00:00'],
              dtype='datetime64[ns]', freq='BH')

pd.date_range('2020-4-1', periods=10, freq='T')
Out[118]: 
DatetimeIndex(['2020-04-01 00:00:00', '2020-04-01 00:01:00',
               '2020-04-01 00:02:00', '2020-04-01 00:03:00',
               '2020-04-01 00:04:00', '2020-04-01 00:05:00',
               '2020-04-01 00:06:00', '2020-04-01 00:07:00',
               '2020-04-01 00:08:00', '2020-04-01 00:09:00'],
              dtype='datetime64[ns]', freq='T')

pd.date_range('2020-4-1', periods=10, freq='L')
Out[120]: 
DatetimeIndex([       '2020-04-01 00:00:00', '2020-04-01 00:00:00.001000',
               '2020-04-01 00:00:00.002000', '2020-04-01 00:00:00.003000',
               '2020-04-01 00:00:00.004000', '2020-04-01 00:00:00.005000',
               '2020-04-01 00:00:00.006000', '2020-04-01 00:00:00.007000',
               '2020-04-01 00:00:00.008000', '2020-04-01 00:00:00.009000'],
              dtype='datetime64[ns]', freq='L')

pd.date_range('2020-4-1', periods=10, freq='S')
Out[121]: 
DatetimeIndex(['2020-04-01 00:00:00', '2020-04-01 00:00:01',
               '2020-04-01 00:00:02', '2020-04-01 00:00:03',
               '2020-04-01 00:00:04', '2020-04-01 00:00:05',
               '2020-04-01 00:00:06', '2020-04-01 00:00:07',
               '2020-04-01 00:00:08', '2020-04-01 00:00:09'],
              dtype='datetime64[ns]', freq='S')

pd.date_range('2020-4-1', periods=10, freq='N')
Out[122]: 
DatetimeIndex([          '2020-04-01 00:00:00',
               '2020-04-01 00:00:00.000000001',
               '2020-04-01 00:00:00.000000002',
               '2020-04-01 00:00:00.000000003',
               '2020-04-01 00:00:00.000000004',
               '2020-04-01 00:00:00.000000005',
               '2020-04-01 00:00:00.000000006',
               '2020-04-01 00:00:00.000000007',
               '2020-04-01 00:00:00.000000008',
               '2020-04-01 00:00:00.000000009'],
              dtype='datetime64[ns]', freq='N')

pd.date_range('2020-4-1', periods=10, freq='1D1H10T10U')
Out[125]: 
DatetimeIndex([       '2020-04-01 00:00:00', '2020-04-02 01:10:00.000010',
               '2020-04-03 02:20:00.000020', '2020-04-04 03:30:00.000030',
               '2020-04-05 04:40:00.000040', '2020-04-06 05:50:00.000050',
               '2020-04-07 07:00:00.000060', '2020-04-08 08:10:00.000070',
               '2020-04-09 09:20:00.000080', '2020-04-10 10:30:00.000090'],
              dtype='datetime64[ns]', freq='90600000010U')

指定后缀默认以改变默认间隔点

在这里插入图片描述

pd.date_range('2020-4-1', periods=10, freq='W-Wed')
Out[126]: 
DatetimeIndex(['2020-04-01', '2020-04-08', '2020-04-15', '2020-04-22',
               '2020-04-29', '2020-05-06', '2020-05-13', '2020-05-20',
               '2020-05-27', '2020-06-03'],
              dtype='datetime64[ns]', freq='W-WED')

采样聚合

ts.resample('M').sum()
Out[129]: 
2020-04-30   -0.247197
2020-05-31    1.055703
2020-06-30   -0.221805
2020-07-31    1.433503
Freq: M, dtype: float64

ts.resample('M').agg([np.sum, np.mean])
Out[132]: 
                 sum      mean
2020-04-30 -0.247197 -0.008240
2020-05-31  1.055703  0.034055
2020-06-30 -0.221805 -0.007394
2020-07-31  1.433503  0.159278

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值