数据分析工具：Pandas（2）

最新推荐文章于 2023-10-04 00:08:10 发布

狄鸠

最新推荐文章于 2023-10-04 00:08:10 发布

阅读量326

点赞数 1

分类专栏： python数据分析

本文链接：https://blog.csdn.net/weixin_44038881/article/details/90343117

版权

python数据分析专栏收录该内容

8 篇文章 1 订阅

订阅专栏

一，时间模块：datetime

datetime.date() 日期对象（年月日）

import datetime

today = datetime.date.today()
print(today,type(today))
print(str(today),type(str(today)))
# datetime.date.today 返回今日
# 输出格式为 date类

t = datetime.date(2016,6,1)
print(t)
# (年，月，日) → 直接得到当时日期

datetime.datetime.now() 当前时间日期对象(年，月，日，时，分，秒)，自定义时间日期

# datetime.datetime：datetime对象

now = datetime.datetime.now()
print(now,type(now))
print(str(now),type(str(now))) 
# .now()方法，输出当前时间
# 输出格式为 datetime类
# 可通过str()转化为字符串

t1 = datetime.datetime(2016,6,1)
t2 = datetime.datetime(2014,1,1,12,44,33)
print(t1,t2)
# (年，月，日，时，分，秒)，至少输入年月日

t2-t1
# 相减得到时间差 —— timedelta

datetime.timedelta：时间差

today = datetime.datetime.today()  # datetime.datetime也有today()方法
yestoday = today - datetime.timedelta(1)  # 
print(today)
print(yestoday)
print(today - datetime.timedelta(7))
# 时间差主要用作时间的加减法，相当于可被识别的时间“差值” 
timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

parser.parse() ：日期字符串转换

from dateutil.parser import parse
date = '12-21-2017'
t = parse(date)
print(t,type(t))
# 直接将str转化成datetime.datetime

print(parse('2000-1-1'),'\n',
     parse('5/1/2014'),'\n',
     parse('5/1/2014', dayfirst = True),'\n',  # 国际通用格式中，日在月之前，可以通过dayfirst来设置
     parse('22/1/2014'),'\n',
     parse('Jan 31, 1997 10:45 PM'))
# 各种格式可以解析，但无法支持中文

二，，Pandas时刻数据：Timestamp

时刻数据代表时间点，是pandas的数据类型，是将值与时间点相关联的最基本类型的时间序列数据

pd.Timestamp() 通过该方法生成pandas里面的时刻数据（Timestamp）

date1 = datetime.datetime(2016,12,1,12,45,30)  # 创建一个datetime.datetime
date2 = '2017-12-21'  # 创建一个字符串

t1 = pd.Timestamp(date1)
t2 = pd.Timestamp(date2)
# 字符串类型和datetime类型的数据转换成Pandas的时刻数据[pd.Timestamp.timestamp(p1)-->转换时间戳]
print(t1, type(t1))
print(t2，type(t1))
print(pd.Timestamp('2017-12-21 15:00:22'), type(pd.Timestamp('2017-12-21 15:00:22')))
# 直接生成pandas的时刻数据
# 数据类型为 pandas的Timestamp

2016-12-01 12:45:30 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2017-12-21 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2017-12-21 15:00:22 <class 'pandas._libs.tslibs.timestamps.Timestamp'>

pd.to_datetime 多个数据转换成Pandas里面的时刻数据（Timestamp）

from datetime import datetime
date1 = datetime(2019, 12, 1, 12, 12)
date2 = '2018-12-01-12-12-0'

t1 = pd.to_datetime(date1)
t2 = pd.to_datetime(date2)
print(t1, type(t1))
print(t1, type(t2))
# pd.to_datetime()：如果是单个时间数据，转换成pandas的时刻数据，数据类型为Timestamp

list_date = ['2011/12/1', '2012-12-1', '2013,12,1']
t3 = pd.to_datetime(list_date)
print(t3, type(t3))
# 多个时间数据，将会转换为pandas的DatetimeIndex(时间序列)

2019-12-01 12:12:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2019-12-01 12:12:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
DatetimeIndex(['2011-12-01', '2012-12-01', '2013-01-01'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

pd.to_datetime 补充

# pd.to_datetime → 多个时间数据转换时间戳索引
from datetime import datetime
date1 = [datetime(2015,6,1),datetime(2015,7,1),datetime(2015,8,1),datetime(2015,9,1),datetime(2015,10,1)]
date2 = ['2017-2-1','2017-2-2','2017-2-3','2017-2-4','2017-2-5','2017-2-6']
print(date1)
print(date2)
t1 = pd.to_datetime(date2)
t2 = pd.to_datetime(date2)
print(t1)
print(t2)
print('-------------------------------')
# 多个时间数据转换为 DatetimeIndex

date3 = ['2017-2-1','2017-2-2','2017-2-3','hello world!','2017-2-5','2017-2-6']
t3 = pd.to_datetime(date3, errors = 'ignore')
print(t3,type(t3))
print('--------------------------------')
# 当一组时间序列中夹杂其他格式数据，可用errors参数返回
# errors = 'ignore':不可解析时返回原始输入，生成一般数组，里面的正常的值是Timestamp

t4 = pd.to_datetime(date3, errors = 'coerce')
print(t4,type(t4))
# errors = 'coerce':不可解析时返回缺失值NaT（Not a Time），结果认为DatetimeIndex

[datetime.datetime(2015, 6, 1, 0, 0), datetime.datetime(2015, 7, 1, 0, 0), datetime.datetime(2015, 8, 1, 0, 0), datetime.datetime(2015, 9, 1, 0, 0), datetime.datetime(2015, 10, 1, 0, 0)]
['2017-2-1', '2017-2-2', '2017-2-3', '2017-2-4', '2017-2-5', '2017-2-6']
DatetimeIndex(['2017-02-01', '2017-02-02', '2017-02-03', '2017-02-04',
               '2017-02-05', '2017-02-06'],
              dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2017-02-01', '2017-02-02', '2017-02-03', '2017-02-04',
               '2017-02-05', '2017-02-06'],
              dtype='datetime64[ns]', freq=None)
-------------------------------
Index(['2017-2-1', '2017-2-2', '2017-2-3', 'hello world!', '2017-2-5',
       '2017-2-6'],
      dtype='object') <class 'pandas.core.indexes.base.Index'>
--------------------------------
DatetimeIndex(['2017-02-01', '2017-02-02', '2017-02-03', 'NaT', '2017-02-05',
               '2017-02-06'],
              dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

三，Pandas时间戳索引：DatetimeIndex

pd.DatetimeIndex()与TimeSeries()时间序列

rng = pd.DatetimeIndex(['12/1/2017',
                        '12/2/2017',
                        '12/3/2017',
                        '12/4/2017',
                        '12/5/2017'])
print(rng,type(rng))
print(rng[0], type(rng[0]))				# DatetimeIndex类型
print('---------------------------')
# 直接生成时间戳索引，[]里支持str字符串、datetime.datetime类型
# 单个时刻数据为Timestamp，多个时刻数据为DatetimeIndex

st = pd.Series(np.random.rand(5), index = rng)
print(st,type(st))		# TimeSsries类型，以DatetimeIndex为Index的Series
print(st.index)
# 以DatetimeIndex为index的Series，为TimeSsries，时间序列

DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04',
               '2017-12-05'],
              dtype='datetime64[ns]', freq=None) <class'pandas.tseries.index.DatetimeIndex'>
2017-12-01 00:00:00 <class 'pandas.tslib.Timestamp'>
---------------------------
2017-12-01    0.837612
2017-12-02    0.539392
2017-12-03    0.100238
2017-12-04    0.285519
2017-12-05    0.939607
dtype: float64 <class 'pandas.core.series.Series'>
DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04',
               '2017-12-05'],
              dtype='datetime64[ns]', freq=None)

pd.date_range()-日期范围：生成日期范围

# 2种生成方式：①start + end； ②start/end + periods
# 默认频率：day
rng1 = pd.date_range('1999-12-1 12:12', '2019-12-1', normalize=True)
rng2 = pd.date_range('19991201', periods=5)
rng3 = pd.date_range(end='21091201 12:12:30', periods=10)
print(rng1)
print(rng2)
print(rng3)
print('---------------------------------')
# 直接生成DatetimeIndex
# pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
# start：开始时间
# end：结束时间
# periods：增长数(一共产生几个时刻数据)
# freq：频率，默认天，pd.date_range()默认频率为日历日，pd.bdate_range()默认频率为工作日
# tz：时区
# normalize：时间参数值正则化到午夜时间戳（这里的12:12输出时就直接变成0:00:00）

print(pd.date_range('20170101','20170104'))
print(pd.date_range('20170101','20170104',closed = 'right'))
print(pd.date_range('20170101','20170104',closed = 'left'))
# close()左右的闭合，等于right时左开又闭，等于left时左闭右开，默认None左右闭闭合
print('---------------------------------')

print(pd.bdate_range('20170101','20170107'))
# pd.bdate_range()默认频率为工作日

print(list(pd.date_range(start = '1/1/2017', periods = 5)))
# 直接转化为list，元素为Timestamp

pd.date_range()-日期范围：频率(1)

# pd.date_range()-日期范围：频率(1)

print(pd.date_range('2017/1/1','2017/1/4'))  # 默认freq = 'D'：每日历日
print(pd.date_range('2017/1/1','2017/1/4', freq = 'B'))  # B：每工作日(=bdate)
print(pd.date_range('2017/1/1','2017/1/2', freq = 'H'))  # H：每小时
print(pd.date_range('2017/1/1 12:00','2017/1/1 12:10', freq = 'T'))  # T/MIN：每分
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'S'))  # S：每秒
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'L'))  # L：每毫秒（千分之一秒）
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'U'))  # U：每微秒（百万分之一秒）

print(pd.date_range('2017/1/1','2017/2/1', freq = 'W-MON'))  
# W-MON：从指定星期几开始算起，每周
# 星期几缩写：MON/TUE/WED/THU/FRI/SAT/SUN

print(pd.date_range('2017/1/1','2017/5/1', freq = 'WOM-2MON'))  
# WOM-2MON：每月的第几个星期几开始算，这里是每月第二个星期一

pd.date_range()-日期范围：频率(2)

# pd.date_range()-日期范围：频率(2)

print(pd.date_range('2017','2018', freq = 'M'))  
print(pd.date_range('2017','2020', freq = 'Q-DEC'))  
print(pd.date_range('2017','2020', freq = 'A-DEC')) 
print('------')
# M：每月最后一个日历日
# Q-月：指定（DEC）月为季度末，每个季度末最后一月的最后一个日历日
# A-月：每年指定月份的最后一个日历日
# 月缩写：JAN/FEB/MAR/APR/MAY/JUN/JUL/AUG/SEP/OCT/NOV/DEC
# 所以Q-月只有三种情况：1-4-7-10,2-5-8-11,3-6-9-12

print(pd.date_range('2017','2018', freq = 'BM'))  
print(pd.date_range('2017','2020', freq = 'BQ-DEC'))  
print(pd.date_range('2017','2020', freq = 'BA-DEC')) 
print('------')
# BM：每月最后一个工作日
# BQ-月：指定月为季度末，每个季度末最后一月的最后一个工作日
# BA-月：每年指定月份的最后一个工作日

print(pd.date_range('2017','2018', freq = 'MS'))  
print(pd.date_range('2017','2020', freq = 'QS-DEC'))  
print(pd.date_range('2017','2020', freq = 'AS-DEC')) 
print('------')
# M：每月第一个日历日
# Q-月：指定月为季度末，每个季度末最后一月的第一个日历日
# A-月：每年指定月份的第一个日历日

print(pd.date_range('2017','2018', freq = 'BMS'))  
print(pd.date_range('2017','2020', freq = 'BQS-DEC'))  
print(pd.date_range('2017','2020', freq = 'BAS-DEC')) 
# BM：每月第一个工作日
# BQ-月：指定月为季度末，每个季度末最后一月的第一个工作日
# BA-月：每年指定月份的第一个工作日

pd.date_range()-日期范围：复合频率

print(pd.date_range('2017/1/1','2017/2/1', freq = '7D'))  # 7天
print(pd.date_range('2017/1/1','2017/1/2', freq = '2h30min'))  # 2小时30分钟
print(pd.date_range('2017','2018', freq = '2M'))  # 2月，每月最后一个日历日

DatetimeIndex(['2017-01-01', '2017-01-08', '2017-01-15', '2017-01-22',
               '2017-01-29'],
              dtype='datetime64[ns]', freq='7D')
DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 02:30:00',
               '2017-01-01 05:00:00', '2017-01-01 07:30:00',
               '2017-01-01 10:00:00', '2017-01-01 12:30:00',
               '2017-01-01 15:00:00', '2017-01-01 17:30:00',
               '2017-01-01 20:00:00', '2017-01-01 22:30:00'],
              dtype='datetime64[ns]', freq='150T')
DatetimeIndex(['2017-01-31', '2017-03-31', '2017-05-31', '2017-07-31',
               '2017-09-30', '2017-11-30'],
              dtype='datetime64[ns]', freq='2M')

asfreq：时期频率转换

ts = pd.Series(np.random.rand(4),
              index = pd.date_range('20170101','20170104'))
print(ts)
print(ts.asfreq('4H',method = 'ffill'))
# 改变频率，这里是D改为4H
# method：插值模式，None不插值，ffill用之前值填充，bfill用之后值填充

2017-01-01    0.945391
2017-01-02    0.656020
2017-01-03    0.295795
2017-01-04    0.318078
Freq: D, dtype: float64
2017-01-01 00:00:00    0.945391
2017-01-01 04:00:00    0.945391
2017-01-01 08:00:00    0.945391
2017-01-01 12:00:00    0.945391
2017-01-01 16:00:00    0.945391
2017-01-01 20:00:00    0.945391
2017-01-02 00:00:00    0.656020
2017-01-02 04:00:00    0.656020
2017-01-02 08:00:00    0.656020
2017-01-02 12:00:00    0.656020
2017-01-02 16:00:00    0.656020
2017-01-02 20:00:00    0.656020
2017-01-03 00:00:00    0.295795
2017-01-03 04:00:00    0.295795
2017-01-03 08:00:00    0.295795
2017-01-03 12:00:00    0.295795
2017-01-03 16:00:00    0.295795
2017-01-03 20:00:00    0.295795
2017-01-04 00:00:00    0.318078
Freq: 4H, dtype: float64

pd.date_range()-日期范围：超前/滞后数据

ts = pd.Series(np.random.rand(4),
              index = pd.date_range('20170101','20170104'))
print(ts)
print('------')

print(ts.shift(2))
print(ts.shift(-2))
print(ts)
print('------')
# 正数：数值后移（滞后）；负数：数值前移（超前）,不改变原来的数据

per = ts/ts.shift(1) - 1
print(per)
print('------')
# 计算变化百分比，这里计算：该时间戳与上一个时间戳相比，变化百分比

print(ts.shift(2, freq = 'D'))
print(ts.shift(2, freq = 'T'))
# 加上freq参数：对时间戳进行位移，而不是对数值进行位移

四，Pandas时期：Period

pd.Period()创建时期

p = pd.Period('2017', freq = 'M')
print(p, type(p))
# 生成一个以2017-01开始，月为频率的时间构造器
# pd.Period()参数：一个时间戳 + freq 参数 → freq 用于指明该 period 的长度，时间戳则说明该 period 在时间轴上的位置

print(p + 1)
print(p - 2)
# 通过加减整数，将周期整体按月移动移动
# 这里是按照 月、年 移动

2017-01 <class 'pandas._libs.tslibs.period.Period'>
2017-02
2016-11

pd.period_range()创建时期范围

prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')
print(prng,type(prng))
print(prng[0],type(prng[0]))
# 数据格式为PeriodIndex，单个数值为Period

ts = pd.Series(np.random.rand(len(prng)), index = prng)
print(ts,type(ts))
print(ts.index)
# 时间序列

# Period('2011', freq = 'A-DEC')可以看成多个时间期的时间段中的游标
# Timestamp表示一个时间戳，是一个时间截面；Period是一个时期，是一个时间段！！但两者作为index时区别不大

PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='int64', freq='M') <class 'pandas.tseries.period.PeriodIndex'>
2011-01 <class 'pandas._period.Period'>
2011-01    0.342571
2011-02    0.826151
2011-03    0.370505
2011-04    0.137151
2011-05    0.679976
2011-06    0.265928
2011-07    0.416502
2011-08    0.874078
2011-09    0.112801
2011-10    0.112504
2011-11    0.448408
2011-12    0.851046
2012-01    0.370605
Freq: M, dtype: float64 <class 'pandas.core.series.Series'>
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='int64', freq='M')

asfreq：频率转换

p = pd.Period('2017','A-DEC')
print(p)
print(p.asfreq('M', how = 'start'))  # 也可写 how = 's'
print(p.asfreq('D', how = 'end'))  # 也可写 how = 'e'
# 通过.asfreq(freq, method=None, how=None)方法转换成别的频率

prng = pd.period_range('2017','2018',freq = 'M')
ts1 = pd.Series(np.random.rand(len(prng)), index = prng)
ts2 = pd.Series(np.random.rand(len(prng)), index = prng.asfreq('D', how = 'start'))
print(ts1.head(),len(ts1))
print(ts2.head(),len(ts2))
# asfreq也可以转换TIMESeries的index

2017
2017-01
2017-12-31
2017-01    0.060797
2017-02    0.441994
2017-03    0.971933
2017-04    0.000334
2017-05    0.545191
Freq: M, dtype: float64 13
2017-01-01    0.447614
2017-02-01    0.679438
2017-03-01    0.891729
2017-04-01    0.949993
2017-05-01    0.942548
Freq: D, dtype: float64 13

时间戳与时期之间的转换：pd.to_period()、pd.to_timestamp()

rng = pd.date_range('2017/1/1', periods = 10, freq = 'M')
prng = pd.period_range('2017','2018', freq = 'M')

ts1 = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts1.head())
print(ts1.to_period().head())
# 每月最后一日，转化为每月，D和M的区别就是每个月第一天和最后一天

ts2 = pd.Series(np.random.rand(len(prng)), index = prng)
print(ts2.head())
print(ts2.to_timestamp().head())
# 每月，转化为每月第一天

2017-01-31    0.125288
2017-02-28    0.497174
2017-03-31    0.573114
2017-04-30    0.665665
2017-05-31    0.263561
Freq: M, dtype: float64
2017-01    0.125288
2017-02    0.497174
2017-03    0.573114
2017-04    0.665665
2017-05    0.263561
Freq: M, dtype: float64
2017-01    0.748661
2017-02    0.095891
2017-03    0.280341
2017-04    0.569813
2017-05    0.067677
Freq: M, dtype: float64
2017-01-01    0.748661
2017-02-01    0.095891
2017-03-01    0.280341
2017-04-01    0.569813
2017-05-01    0.067677
Freq: MS, dtype: float64

五，时间序列 - 索引及切片

TimeSeries是Series的一个子类，所以Series索引及数据选取方面的方法基本一样

同时TimeSeries通过时间序列有更便捷的方法做索引和切片

索引

from datetime import datetime

rng = pd.date_range('2017/1','2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts.head())

print(ts[0])
print(ts[:2])
print('-----')
# 基本下标位置索引

print(ts['2017/1/2'])
print(ts['20170103'])
print(ts['1/10/2017'])
print(ts[datetime(2017,1,20)])
print('-----')
# 时间序列标签索引，支持各种时间字符串，以及datetime.datetime

# 时间序列由于按照时间先后排序，故不用考虑下标顺序问题
# 索引方法同样适用于Dataframe

切片

rng = pd.date_range('2017/1','2017/3',freq = '12H')
ts = pd.Series(np.random.rand(len(rng)), index = rng)

print(ts['2017/1/5':'2017/1/10'])
print('-----')
# 和Series按照index索引原理一样，也是末端包含

print(ts['2017/2'].head())
# 传入月，直接得到一个切片

重复索引的时间序列

dates = pd.DatetimeIndex(['1/1/2015','1/2/2015','1/3/2015','1/4/2015','1/1/2015','1/2/2015'])
ts = pd.Series(np.random.rand(6), index = dates)
print(ts)
print(ts.is_unique,ts.index.is_unique)
print('-----')
# index有重复，is_unique检查 → 判断ts里的values是否唯一，判断ts里的index是否唯一
# 通过is_unique来做判断

print(ts['20150101'],type(ts['20150101']))
print(ts['20150104'],type(ts['20150104']))
print('-----')
# index有重复的将返回多个值

print(ts.groupby(level = 0).mean())
# 通过groupby做分组，重复的值这里用平均值处理
# level是以第0个为标准

2015-01-01    0.300286
2015-01-02    0.603865
2015-01-03    0.017949
2015-01-04    0.026621
2015-01-01    0.791441
2015-01-02    0.526622
dtype: float64
True False
-----
2015-01-01    0.300286
2015-01-01    0.791441
dtype: float64 <class 'pandas.core.series.Series'>
2015-01-04    0.026621
dtype: float64 <class 'pandas.core.series.Series'>
-----
2015-01-01    0.545863
2015-01-02    0.565244
2015-01-03    0.017949
2015-01-04    0.026621
dtype: float64

六，时间序列 - 重采样

重采样：将时间序列从一个频率转换为另一个频率的过程，且会有数据的结合

降采样：高频数据 → 低频数据，eg.以天为频率的数据转为以月为频率的数据
升采样：低频数据 → 高频数据，eg.以年为频率的数据转为以月为频率的数据

重采样：.resample()

rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.random.rand(12), index = rng)	# 创建以时间序列为index的Series
print(ts)
print('-------------')

ts_re = ts.resample('5D')			# 进行一个以5天为一个频率的时间重采样
ts_re = ts.resample('5D').sum
print(ts_re)
print(ts_re.ohlc())
# ts.resample('5D')：得到一个重采样构建器，频率改为5天
# ts.resample('5D').sum():得到一个新的聚合后的Series，聚合方式为求和
# freq：重采样频率 → ts.resample('5D')
# .sum()：聚合方法

print(ts.resample('5D').mean(),'→ 求平均值\n')
print(ts.resample('5D').max(),'→ 求最大值\n')
print(ts.resample('5D').min(),'→ 求最小值\n')
print(ts.resample('5D').median(),'→ 求中值\n')
print(ts.resample('5D').first(),'→ 返回第一个值\n')
print(ts.resample('5D').last(),'→ 返回最后一个值\n')
print(ts.resample('5D').ohlc(),'→ OHLC重采样\n')
# OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘

降采样

rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(1,13), index = rng)
print(ts)

print(ts.resample('5D').sum(),'→ 默认\n')
print(ts.resample('5D', closed = 'left').sum(),'→ left\n')
print(ts.resample('5D', closed = 'right').sum(),'→ right\n')
print('-----')
# closed：各时间段哪一端是闭合（即包含）的，默认 左闭右闭
# 详解：这里values为0-11，按照5D重采样 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
# left指定间隔左边为结束 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
# right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]

print(ts.resample('5D', label = 'left').sum(),'→ leftlabel\n')
print(ts.resample('5D', label = 'right').sum(),'→ rightlabel\n')
# label：聚合值的index，默认为取采集的样本的最左边index显示
# 值采样认为默认（这里closed默认）

升采样

rng = pd.date_range('2017/1/1 0:0:0', periods = 5, freq = 'H')
ts = pd.DataFrame(np.arange(15).reshape(5,3),
                  index = rng,
                  columns = ['a','b','c'])
print(ts)

print(ts.resample('15T').asfreq())
print(ts.resample('15T').ffill())
print(ts.resample('15T').bfill())
# 低频转高频，主要是如何插值
# .asfreq()：不做填充，返回Nan
# .ffill()：向上填充
# .bfill()：向下填充

时期重采样 - Period

prng = pd.period_range('2016','2017',freq = 'M')
ts = pd.Series(np.arange(len(prng)), index = prng)
print(ts)

print(ts.resample('3M').sum())  # 降采样
print(ts.resample('15D').ffill())  # 升采样

狄鸠

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
数据分析工具：Pandas（2）

一，时间模块：datetimedatetime.date() 日期对象（年月日）import datetimetoday = datetime.date.today()print(today,type(today))print(str(today),type(str(today)))# datetime.date.today 返回今日# 输出格式为 date类t = d...
复制链接

扫一扫