1. to_datetime方法,用来建立时间点
import numpy as np
import pandas as pd
pd.to_datetime('2020/1/1')
Timestamp('2020-01-01 00:00:00')
pd.Series(range(2),index=pd.to_datetime(['2020/1/1','2020/1/2']))
2020-01-01 0
2020-01-02 1
dtype: int64
df = pd.DataFrame({'year':[2020,2020],'month':[1,1],'day':[1,2]})
df
| year | month | day |
---|
0 | 2020 | 1 | 1 |
---|
1 | 2020 | 1 | 2 |
---|
pd.to_datetime(df)
0 2020-01-01
1 2020-01-02
dtype: datetime64[ns]
2. date_range方法
start/end/periods(时间点个数)/freq(间隔方法)是这个方法的重要参数
pd.date_range(start='2020/1/1',end='2020/1/10',periods=3)
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-05 12:00:00',
'2020-01-10 00:00:00'],
dtype='datetime64[ns]', freq=None)
pd.date_range(start='2020/1/1',end='2020/1/10',freq='D')
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10'],
dtype='datetime64[ns]', freq='D')
pd.date_range(start='2020/1/1',periods=3,freq='D')
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq='D')
3. DateOffset 对象
DateOffset的可选参数包括years/months/weeks/days/hours/minutes/seconds
pd.Timestamp('2020-01-01')
Timestamp('2020-01-01 00:00:00')
pd.Timestamp('2020-01-01')+pd.DateOffset(minutes = 20)-pd.DateOffset(days = 1)
Timestamp('2019-12-31 00:20:00')
pd.Timestamp('2020-01-01') + pd.offsets.Week(2)
Timestamp('2020-01-15 00:00:00')
序列的offset操作
利用apply函数
pd.date_range('20200101',periods=3,freq = 'Y')
DatetimeIndex(['2020-12-31', '2021-12-31', '2022-12-31'], dtype='datetime64[ns]', freq='A-DEC')
pd.Series(pd.offsets.BYearBegin(3).apply(i) for i in pd.date_range('20200101',periods=3,freq = 'Y') )
0 2023-01-02
1 2024-01-01
2 2025-01-01
dtype: datetime64[ns]
pd.date_range('20200101',periods=3,freq='Y')+pd.offsets.BYearBegin(3)
DatetimeIndex(['2023-01-02', '2024-01-01', '2025-01-01'], dtype='datetime64[ns]', freq='A-DEC')
时序的索引及属性
1. 索引切片
rng = pd.date_range('2020','2021',freq='W')
ts = pd.Series(np.random.rand(len(rng)),index=rng)
ts.head()
2020-01-05 0.009639
2020-01-12 0.061814
2020-01-19 0.470897
2020-01-26 0.803914
2020-02-02 0.104896
Freq: W-SUN, dtype: float64
ts['2020-01-05']
0.009639339138300618
2. 子集索引
ts['2020-7']
2020-07-05 0.073073
2020-07-12 0.593621
2020-07-19 0.028066
2020-07-26 0.537048
Freq: W-SUN, dtype: float64
ts['2011-1':'20200726'].head()
2020-01-05 0.009639
2020-01-12 0.061814
2020-01-19 0.470897
2020-01-26 0.803914
2020-02-02 0.104896
Freq: W-SUN, dtype: float64
3. 时间点的属性
采用dt对象获取关于时间的信息
pd.Series(ts.index).head()
0 2020-01-05
1 2020-01-12
2 2020-01-19
3 2020-01-26
4 2020-02-02
dtype: datetime64[ns]
pd.Series(ts.index).dt.month.head()
0 1
1 1
2 1
3 1
4 2
dtype: int64
pd.Series(ts.index).dt.day.head()
0 5
1 12
2 19
3 26
4 2
dtype: int64
利用strftime重新修改时间格式
pd.Series(ts.index).dt.strftime('%Y*%m*%d').head()
0 2020*01*05
1 2020*01*12
2 2020*01*19
3 2020*01*26
4 2020*02*02
dtype: object
对于datetime对象可以直接通过属性获取信息
pd.date_range('2020','2021',freq='W')
DatetimeIndex(['2020-01-05', '2020-01-12', '2020-01-19', '2020-01-26',
'2020-02-02', '2020-02-09', '2020-02-16', '2020-02-23',
'2020-03-01', '2020-03-08', '2020-03-15', '2020-03-22',
'2020-03-29', '2020-04-05', '2020-04-12', '2020-04-19',
'2020-04-26', '2020-05-03', '2020-05-10', '2020-05-17',
'2020-05-24', '2020-05-31', '2020-06-07', '2020-06-14',
'2020-06-21', '2020-06-28', '2020-07-05', '2020-07-12',
'2020-07-19', '2020-07-26', '2020-08-02', '2020-08-09',
'2020-08-16', '2020-08-23', '2020-08-30', '2020-09-06',
'2020-09-13', '2020-09-20', '2020-09-27', '2020-10-04',
'2020-10-11', '2020-10-18', '2020-10-25', '2020-11-01',
'2020-11-08', '2020-11-15', '2020-11-22', '2020-11-29',
'2020-12-06', '2020-12-13', '2020-12-20', '2020-12-27'],
dtype='datetime64[ns]', freq='W-SUN')
pd.date_range('2020','2021',freq='W').month
Int64Index([ 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4,
5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8,
8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12,
12],
dtype='int64')
重采样
1. 利用resample函数进行重采样
df_r = pd.DataFrame(np.random.rand(1000,3),index=pd.date_range('1/1/2020',freq='S',periods=1000),columns=['A','B','C'])
df_r.head()
| A | B | C |
---|
2020-01-01 00:00:00 | 0.030495 | 0.346511 | 0.329326 |
---|
2020-01-01 00:00:01 | 0.834827 | 0.302838 | 0.550707 |
---|
2020-01-01 00:00:02 | 0.041700 | 0.200662 | 0.000873 |
---|
2020-01-01 00:00:03 | 0.483835 | 0.211402 | 0.447749 |
---|
2020-01-01 00:00:04 | 0.599844 | 0.850822 | 0.650199 |
---|
r = df_r.resample('3min')
r
<pandas.core.resample.DatetimeIndexResampler object at 0x0000023ABC3C19E8>
r.sum()
| A | B | C |
---|
2020-01-01 00:00:00 | 92.942598 | 89.039870 | 90.154015 |
---|
2020-01-01 00:03:00 | 89.688035 | 86.098906 | 89.821036 |
---|
2020-01-01 00:06:00 | 90.771298 | 90.452599 | 90.166902 |
---|
2020-01-01 00:09:00 | 93.041682 | 93.163446 | 90.303107 |
---|
2020-01-01 00:12:00 | 92.641535 | 95.147615 | 89.392215 |
---|
2020-01-01 00:15:00 | 55.440978 | 50.845603 | 50.553468 |
---|
r.mean()
| A | B | C |
---|
2020-01-01 00:00:00 | 0.516348 | 0.494666 | 0.500856 |
---|
2020-01-01 00:03:00 | 0.498267 | 0.478327 | 0.499006 |
---|
2020-01-01 00:06:00 | 0.504285 | 0.502514 | 0.500927 |
---|
2020-01-01 00:09:00 | 0.516898 | 0.517575 | 0.501684 |
---|
2020-01-01 00:12:00 | 0.514675 | 0.528598 | 0.496623 |
---|
2020-01-01 00:15:00 | 0.554410 | 0.508456 | 0.505535 |
---|
df_r2 = pd.DataFrame(np.random.randn(200,3),index=pd.date_range('1/1/2020',freq= 'D',periods=200),columns=['A','B','C'])
df_r2.head()
| A | B | C |
---|
2020-01-01 | 0.917372 | -0.305394 | -1.163468 |
---|
2020-01-02 | 1.027062 | -0.722735 | -0.390128 |
---|
2020-01-03 | 0.902275 | 0.306910 | -0.482234 |
---|
2020-01-04 | -0.362833 | 0.583678 | 0.716035 |
---|
2020-01-05 | -0.467158 | -1.345731 | -2.380988 |
---|
r = df_r2.resample('CBMS')
r.sum()
| A | B | C |
---|
2020-01-01 | 1.068820 | 3.800052 | -5.188159 |
---|
2020-02-03 | 2.292636 | 3.275062 | -7.076462 |
---|
2020-03-02 | 8.565083 | -1.366455 | -0.495671 |
---|
2020-04-01 | 6.730568 | 2.981884 | -4.715256 |
---|
2020-05-01 | 6.593104 | 4.132095 | -13.792500 |
---|
2020-06-01 | 11.134183 | -1.903034 | -19.861671 |
---|
2020-07-01 | -0.863740 | -1.153881 | -4.164100 |
---|
2. 采样聚合
r = df_r.resample('3T')
r['A'].mean()
2020-01-01 00:00:00 0.516348
2020-01-01 00:03:00 0.498267
2020-01-01 00:06:00 0.504285
2020-01-01 00:09:00 0.516898
2020-01-01 00:12:00 0.514675
2020-01-01 00:15:00 0.554410
Freq: 3T, Name: A, dtype: float64
r['A'].agg([np.sum,np.mean,np.std])
| sum | mean | std |
---|
2020-01-01 00:00:00 | 92.942598 | 0.516348 | 0.288201 |
---|
2020-01-01 00:03:00 | 89.688035 | 0.498267 | 0.293031 |
---|
2020-01-01 00:06:00 | 90.771298 | 0.504285 | 0.280504 |
---|
2020-01-01 00:09:00 | 93.041682 | 0.516898 | 0.287934 |
---|
2020-01-01 00:12:00 | 92.641535 | 0.514675 | 0.278168 |
---|
2020-01-01 00:15:00 | 55.440978 | 0.554410 | 0.260155 |
---|
r.agg({'A':np.sum,'B':lambda x:max(x)-min(x)})
| A | B |
---|
2020-01-01 00:00:00 | 92.942598 | 0.997325 |
---|
2020-01-01 00:03:00 | 89.688035 | 0.996741 |
---|
2020-01-01 00:06:00 | 90.771298 | 0.993118 |
---|
2020-01-01 00:09:00 | 93.041682 | 0.988965 |
---|
2020-01-01 00:12:00 | 92.641535 | 0.993741 |
---|
2020-01-01 00:15:00 | 55.440978 | 0.975447 |
---|
3.采样组的迭代
采用组的迭代和groupby迭代类似,对于每一个组都可以做相应的操作
small = pd.Series(range(6),index=pd.to_datetime(['2020-01-01 00:00:00', '2020-01-01 00:30:00'
, '2020-01-01 00:31:00','2020-01-01 01:00:00'
,'2020-01-01 03:00:00','2020-01-01 03:05:00']))
resampled = small.resample('H')
for name, group in resampled:
print("Group: ", name)
print("-" * 27)
print(group, end="\n\n")
Group: 2020-01-01 00:00:00
---------------------------
2020-01-01 00:00:00 0
2020-01-01 00:30:00 1
2020-01-01 00:31:00 2
dtype: int64
Group: 2020-01-01 01:00:00
---------------------------
2020-01-01 01:00:00 3
dtype: int64
Group: 2020-01-01 02:00:00
---------------------------
Series([], dtype: int64)
Group: 2020-01-01 03:00:00
---------------------------
2020-01-01 03:00:00 4
2020-01-01 03:05:00 5
dtype: int64
窗口函数
rolling/expanding
s = pd.Series(np.random.rand(1000),index=pd.date_range('1/1/2020',periods=1000))
s.head()
2020-01-01 0.842824
2020-01-02 0.826125
2020-01-03 0.860557
2020-01-04 0.511902
2020-01-05 0.144901
Freq: D, dtype: float64
s.rolling(window=50)
Rolling [window=50,center=False,axis=0]
s.rolling(window=50).mean()
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
...
2022-09-22 0.498686
2022-09-23 0.511147
2022-09-24 0.514356
2022-09-25 0.509788
2022-09-26 0.505171
Freq: D, Length: 1000, dtype: float64
s.rolling(window=50,min_periods=3).mean().head()
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 0.843169
2020-01-04 0.760352
2020-01-05 0.637262
Freq: D, dtype: float64
普通的expanding函数于rolling(window = len(s),min_periods = 1),是对序列的累计计算
s.rolling(window=len(s),min_periods=1).sum().head()
2020-01-01 0.842824
2020-01-02 1.668950
2020-01-03 2.529507
2020-01-04 3.041409
2020-01-05 3.186310
Freq: D, dtype: float64
s.expanding().sum().head()
2020-01-01 0.842824
2020-01-02 1.668950
2020-01-03 2.529507
2020-01-04 3.041409
2020-01-05 3.186310
Freq: D, dtype: float64
参考:https://github.com/datawhalechina/joyful-pandas