数据分析库pandas入门 9——时间段与时间索引类型、shift函数、resample函数

1379号程序员

已于 2023-04-05 17:58:47 修改

阅读量791

点赞数

分类专栏： Pandas 文章标签： pandas 数据分析学习

于 2022-11-13 23:05:52 首次发布

本文链接：https://blog.csdn.net/weixin_45447382/article/details/127746900

版权

Pandas 专栏收录该内容

10 篇文章 5 订阅

订阅专栏

Pandas库使用入门9——日期时间数据类型与处理2

Pandas 时间段与时间索引类型
shift 函数与 resample 函数

在前几篇关于 pandas 的文章中，笔者分别介绍了：

上一篇介绍了 Pandas 自带的 time、datetime 库关于日期时间的数据类型及其处理方法和 pandas 时间戳，本篇继续介绍 Pandas 提供的时间段类型以及时间段、时间戳索引对象和2个相关函数：shift、resample。

Pandas 时间段与时间索引类型

在 pandas 中，用 Period 对象表示一个时间段，还有两个特殊的索引：PeriodIndex时间段索引和 DatetimeIndex 时间戳索引。PeriodIndex 索引中是 Period 对象，DatetimeIndex 索引中是上一篇介绍过的 Timestamp 对象。创建方法如下：

pd.Period(value, freq, ordinal, year, month, quarter, day, hour, minute, second)
value：一个 Period 对象或者字符串，表示一个时段
freq：一个字符串，表示区间长度
year等参数：数值，指定具体年份等等
返回一个 Period 对象，有多种属性和几个方法。
pd.period_range(start, end, periods, freq, name)
返回一个 PeriodIndex 索引对象，其各个值是 Period 时间段对象。
start/end：指定了起始时间/结束段
periods：一个整数（大于0），指定生成多少个时间段
freq：一个字符串或者 Period 对象，指定频率
name：指定该 Index 的名字
pd.date_range(start, end, periods, freq, tz, normalize=False, name, closed, **kwargs)
返回一个 DatetimeIndex 索引对象，其各个值是 Timestamp 时间戳对象，用 numpy 的datetime64 数据类型存储。
start/end：指定索引的起始/结束时间。
periods：一个整数（大于0），指定生成多少个时间
freq：一个字符串或者 DateOffset 对象，指定频率
tz：指定时区
name：指定该 Index 的名字
closed：指定区间类型，为’left’(左闭右开)，‘right’(左开右闭)，None(左闭右闭)

示例如下：

import pandas as pd

p1 = pd.Period('20221113')  # 从日期字符串创建 Period 对象，还可指定时分（秒），如：pd.Period('202201071245') pd.Period('20220107124509')
p2 = pd.Period('2004Q2')    # 从年份+季度字符串创建 Period 对象
p3 = pd.Period(year=2022, month=2, day=20, freq='D')       # 指定日期及频率创建 Period 对象

p1.year, p1.month, p1.day, p1.hour, p1.minute, p1.second, p1.quarter  # Period对象属性值（年、月、日、时、分、秒、季度）
p1.weekofyear, p1.dayofyear, p1.dayofweek, p1.daysinmonth  # Period对象属性值（年中第几周、年中第几天、星期几 0~6代表周一~日、月份总天数）
p1.start_time, p1.end_time, p1.is_leap_year                # Period对象属性（起始时间、末尾时间、是否为闰年）

p1.asfreq('M')           # 转换为其他区间('Y'/'y'表示年，'M'/'m'表示月，'D'/'d'表示日，'H'/'h'表示时，'T'/'min'表示分，'S'/'s'表示秒)
p1.asfreq('T', how='S')  # how为'S'/'start'，表示包含区间开始，为'E'/'end'，表示包含区间结束（默认）
p1.to_timestamp('H', how='E')  # 转换为时间戳，字母'H'意义同上，how取值同上默认为'S'
(pd.Period('20220207') - pd.Period('20220119')).n  # 两个同频时间段相隔数量，此处为天数

dr = pd.period_range(start='2022-11-13', periods=7, freq='D')  # 创建一个 PeriodIndex 索引对象，从2022-11-13开始包含7天
dr = pd.period_range(start='2022-11-13', end='2023-04-15', freq='M')  # 创建一个 PeriodIndex 索引对象，从2022-11月到2023-04月（包括起始和结尾）
dr.to_timestamp()  # 转换为 DatetimeIndex 索引

di = pd.date_range(start='2022-11-13', end='2022-12-31', freq='D')  # 创建一个 DatetimeIndex 索引对象，从2022-11-13到2022-12-31，每天一个时间
di = pd.date_range(start='2022-11-13', periods=6, freq='M')  # 创建一个 DatetimeIndex 索引对象，从2022-11-30开始按月间隔包含6个时间
di.to_period()     # 转换为 PeriodIndex 索引
di.to_pydatetime() # 转换为元素为 datetime 类型的 numpy 数组

shift 函数与 resample 函数

Series/DataFrame的 shift 方法用于执行单纯的前移或者后移操作。
重采样 resample 指的是将时间序列从一个频率转换到另一个频率的处理过程。分为降采样、升采样和平移采样。将高频数据转换到低频数据称作降采样，将低频数据转换到高频数据称作升采样。降采样时，待聚合的数据不必拥有固定的频率。升采样不需要聚合，而是插值，默认引入缺失值。

Series/DataFrame.shift(periods=1, freq, axis=0, fill_value)
periods：一个整数（可为负），指定移动数量。对于时间序列，单位由 freq 指定。
freq：一个 DateOffset/timedelta/频率字符串。指定移动的单位。如果为PeriodIndex，则 freq 必须和它匹配。
axis：为 0/‘index’ 表示沿着0轴移动；为 1/‘columns’ 表示沿着1轴移动
如果为时间序列，则该方法移动并建立一个新的索引，但是 Series/DataFrame 的值不变。对于非时间序列，则保持索引不变，而移动 Series/DataFrame 的值。
Series/DataFrame.resample(rule, axis=0, closed, label, convention='start', kind, on, level, origin='start_day', offset)
rule：一个字符串，指定重采样的目标频率
axis：为 0/‘index’ 表示沿着0轴重采样；为 1/‘columns’ 表示沿着1轴重采样
closed：指定降采样中，各时间段哪一端闭合。为’right’，则左开右闭；为’left’，则左闭右开。
label：降采样中，如何设置聚合值的标签，可以为 ‘right’/‘left’。
convention：当重采样时期时，将低频转换到高频所采用的约定，可以为 ‘s’/‘start’（用第一个高频）或者 ‘e’/‘end’（用最后一个高频）。
kind：一个字符串，指定聚合到时间段 Period 还是时间戳 Timestamp。默认聚合到时间序列的索引类型。
on：一个字符串，对于DataFrame，指定重采样的列，该列必须是datetime-like。
level：一个字符串或者整数。对于MultiIndex，该参数指定了被重采样的子索引。

示例如下（接续上述代码）：

df = pd.DataFrame({'fr':[12, 13, 2, 4, 34, 9], 
                   'ag':['s','r','w','t','q','b']}, index=di)
df.shift(periods=1, freq='d')        # 对各时间索引的值向前移动1天
df.shift(periods=-3, freq='h')       # 对各时间索引的值向后移动3小时
df2 = df[['fr','ag']].reset_index()  # 不含时间索引
df2.shift(periods=-2)                # 非时间序列索引，直接把数据行上移，末2行填充NaN

index_t = pd.date_range(start='2022-11-13 20:00:00', periods=30, freq='D')
s1 = pd.Series(range(30), index=index_t)
s1.resample(rule='W', closed='left', label='left').sum()                  # 按周降采样求和
s1.resample(rule='4D', closed='left', label='left', offset='2min').sum()  # 按4天降采样
s1.resample(rule='20T', offset='1s', convention='s').mean()               # 按20小时升采样求均值
s1.resample(rule='W-WED', closed='left', label='left').sum()              # 按周降采样，节点为周三

以上。

1379号程序员

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
数据分析库pandas入门 9——时间段与时间索引类型、shift函数、resample函数

本篇接续上篇，介绍 Pandas 提供的时间段类型以及时间段、时间戳索引对象和2个相关函数：shift 移动函数、resample 重采样函数。
复制链接

扫一扫

专栏目录