Python 数据分析第五期–简述时间序列数据分析
文章目录
1.Python 的日期和时间处理操作
1.1 时间序列分类
时间戳(timestamp),特定的时刻。
固定周期(period),某月或某年。
时间间隔(interval),由起始时间戳和结束时间戳表示。
1.2 datetime 模块
In [1]:
from datetime import datetime
In [2]:
now = datetime.now()
print(now)
out:
2020-02-15 22:10:48.492218
In [3]:
print('年: {}, 月: {}, 日: {}'.format(now.year, now.month, now.day))
out:
年: 2020, 月: 2, 日: 15
In [4]:
diff = datetime(2017, 3, 4, 17) - datetime(2017, 2, 18, 15)
print(type(diff))
print(diff)
print('经历了{}天, {}秒。'.format(diff.days, diff.seconds))
<class 'datetime.timedelta'>
14 days, 2:00:00
经历了14天, 7200秒。
1.3 str -> datetime
In [7]:
# strptime
dt_str = '2017-02-18'
dt_obj2 = datetime.strptime(dt_str, '%Y-%m-%d')
print(type(dt_obj2))
print(dt_obj2)
<class 'datetime.datetime'>
2017-02-18 00:00:00
In [8]:
# dateutil.parser.parse
from dateutil.parser import parse
dt_str2 = '2017/02/18'
dt_obj3 = parse(dt_str2)
print(type(dt_obj3))
print(dt_obj3)
<class 'datetime.datetime'>
2017-02-18 00:00:00
In [9]:
# pd.to_datetime
import pandas as pd
s_obj = pd.Series(['2017/02/18', '2017/02/19', '2017-02-25', '2017-02-26'], name='course_time')
print(s_obj)
0 2017/02/18
1 2017/02/19
2 2017-02-25
3 2017-02-26
Name: course_time, dtype: object
In [10]:
s_obj2 = pd.to_datetime(s_obj)
print(s_obj2)
0 2017-02-18
1 2017-02-19
2 2017-02-25
3 2017-02-26
Name: course_time, dtype: datetime64[ns]
In [11]:
# 处理缺失值
s_obj3 = pd.Series(['2017/02/18', '2017/02/19', '2017-02-25', '2017-02-26'] + [None],
name='course_time')
print(s_obj3)
0 2017/02/18
1 2017/02/19
2 2017-02-25
3 2017-02-26
4 None
Name: course_time, dtype: object
In [12]:
s_obj4 = pd.to_datetime(s_obj3)
print(s_obj4) # NAT-> Not a Time
0 2017-02-18
1 2017-02-19
2 2017-02-25
3 2017-02-26
4 NaT
Name: course_time, dtype: datetime64[ns]
2.Pandas 的日期和时间处理操作
2.1 Pandas的时间序列处理
创建
from datetime import datetime
import pandas as pd
import numpy as np
# 指定index为datetime的list
date_list = [datetime(2017, 2, 18), datetime(2017, 2, 19),
datetime(2017, 2, 25), datetime(2017, 2, 26),
datetime(2017, 3, 4), datetime(2017, 3, 5)]
time_s = pd.Series(np.random.randn(6), index