代码中需要用到的包
# Importing libraries
import os
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
# Above is a special style template for matplotlib, highly useful for visualizing time series data
%matplotlib inline
from pylab import rcParams
from plotly import tools
import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
import statsmodels.api as sm
from numpy.random import normal, seed
from scipy.stats import norm
from statsmodels.tsa.arima_model import ARMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.arima_model import ARIMA
import math
from sklearn.metrics import mean_squared_error
print(os.listdir("../input"))
运行结果:
['historical-hourly-weather-data', 'stock-time-series-20050101-to-20171231']
目录
- 1. Introduction to date and time
- 2. Finance and Statistics
- 3. Time series decomposition and Random Walks
- 4. Modelling using statsmodels
1. 日期和时间介绍
1.1 导入时间序列数据
如何导入数据?
首先,我们导入这个内核所需的所有数据集。所需的时间序列列使用parse_dates参数作为datetime列导入,也使用index_col参数作为数据帧的索引选择。
正在使用的数据:
- 谷歌股票数据
- 世界不同城市的湿度
- 微软股票数据
- 世界不同城市的压力
in:
google = pd.read_csv('../input/stock-time-series-20050101-to-20171231/GOOGL_2006-01-01_to_2018-01-01.csv', index_col='Date', parse_dates=['Date'])
google.head()
out:
in:
humidity = pd.read_csv('../input/historical-hourly-weather-data/humidity.csv', index_col='datetime', parse_dates=['datetime'])
humidity.tail()
out:
1.2 清理和准备时间序列数据
如何准备数据?
谷歌股票数据没有任何缺失值,但湿度数据确实有其缺失值的公平份额。它是使用fillna()方法和ffill参数清理的,该参数传播最后一个有效观察值以填充间隙
in:
humidity = humidity.iloc[1:]
humidity = humidity.fillna(method='ffill')
humidity.head()
out:
1.3 可视化数据集
in:
humidity["Kansas City"].asfreq('M').plot() # asfreq method is used to convert a time series to a specified frequency. Here it is monthly frequency.
plt.title('Humidity in Kansas City over time(Monthly frequency)')
plt.show()
out:
in:
google['2008':'2010'].plot(subplots=True, figsize=(10,12))
plt.title('Google stock attributes from 2008 to 2010')
plt.savefig('stocks.png')
plt.show()
out:
1.4 时间戳和周期
什么是时间戳和句点,它们如何有用?
时间戳用于表示时间点。周期表示时间间隔。周期可用于检查给定周期内是否存在特定事件。它们也可以转换为彼此的形式。
in:
# Creating a Timestamp
timestamp = pd.Timestamp(2017, 1, 1, 12)
timestamp
out:
Timestamp('2017-01-01 12:00:00')
in:
# Creating a period
period = pd.Period('2017-01-01')
period
out:
Period('2017-01-01', 'D')
in:
# Checking if the given timestamp exists in the given period
period.start_time < timestamp < period.end_time
out:
True
in:
# Converting timestamp to period
new_period = timestamp.to_period(freq='H')
new_period
out:
Period('2017-01-01 12:00', 'H')
in:
# Converting period to timestamp
new_timestamp = period.to_timestamp(freq='H', how='start')
new_timestamp
out:
Timestamp('2017-01-01 00:00:00')
1.5 使用日期范围
什么是data_range数据范围,它如何有用?
date_range是一种返回固定频率datetimeindex的方法。当为预先存在的数据创建自己的时间序列属性或围绕您创建的时间序列属性排列整个数据时,它非常有用。
in:
# Creating a datetimeindex with daily frequency
dr1 = pd.date_range(start='1/1/18', end='1/9/18')
dr1
out:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
'2018-01-09'],
dtype='datetime64[ns]', freq='D')
in:
# Creating a datetimeindex with monthly frequency
dr2 = pd.date_range(start='1/1/18', end='1/1/19', freq='M')
dr2
out:
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
'2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
'2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31'],
dtype='datetime64[ns]', freq='M')
in:
# Creating a datetimeindex without specifying start date and using periods
dr3 = pd.date_range(end='1/4/2014', periods=8)
dr3
out:
DatetimeIndex(['2013-12-28', '2013-12-29', '2013-12-30', '2013-12-31',
'2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04'],
dtype='datetime64[ns]', freq='D')
in:
# Creating a datetimeindex specifying start date , end date and periods
dr4 = pd.date_range(start='2013-04-24', end='2014-11-27', periods=3)
dr4
out:
DatetimeIndex(['2013-04-24', '2014-02-09', '2014-11-27'], dtype='datetime64[ns]', freq=None)
1.6 使用到日期时间
pandas.to_datetime() 用于将参数转换为datetime。这里,数据帧被转换为日期时间序列。
in:
df = pd.DataFrame({'year': [2015, 2016], 'month': [2, 3], 'day': [4, 5]})
df
out: