python时间序列分析航空旅人_大佬整理的Python数据可视化时间序列案例，建议收藏（附代码）|python基础教程|python入门|python教程...-CSDN博客

https://www.xin3721.com/eschool/pythonxin3721/

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。

时间序列

1、时间序列图

时间序列图用于可视化给定指标如何随时间变化。在这里，您可以了解1949年至1969年之间的航空客运流量如何变化。

#Import Data

df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')#Draw Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.plot('date', 'traffic', data=df, color='tab:red')#Decoration

plt.ylim(50, 750)

xtick_location= df.index.tolist()[::12]

xtick_labels= [x[-4:] for x in df.date.tolist()[::12]]

plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7)

plt.yticks(fontsize=12, alpha=.7)

plt.title("Air Passengers Traffic (1949 - 1969)", fontsize=22)

plt.grid(axis='both', alpha=.3)#Remove borders

plt.gca().spines["top"].set_alpha(0.0)

plt.gca().spines["bottom"].set_alpha(0.3)

plt.gca().spines["right"].set_alpha(0.0)

plt.gca().spines["left"].set_alpha(0.3)

plt.show()

2、带有标记的时间序列图

下面的时间序列绘制了所有的波峰和波谷，并注释了选定特殊事件的发生。

#Import Data

df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')#Get the Peaks and Troughs

data = df['traffic'].values

doublediff=np.diff(np.sign(np.diff(data)))

peak_locations= np.where(doublediff == -2)[0] + 1doublediff2= np.diff(np.sign(np.diff(-1*data)))

trough_locations= np.where(doublediff2 == -2)[0] + 1

#Draw Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.plot('date', 'traffic', data=df, color='tab:blue', label='Air Traffic')

plt.scatter(df.date[peak_locations], df.traffic[peak_locations], marker=mpl.markers.CARETUPBASE, color='tab:green', s=100, label='Peaks')

plt.scatter(df.date[trough_locations], df.traffic[trough_locations], marker=mpl.markers.CARETDOWNBASE, color='tab:red', s=100, label='Troughs')#Annotate

for t, p in zip(trough_locations[1::5], peak_locations[::3]):

plt.text(df.date[p], df.traffic[p]+15, df.date[p], horizontalalignment='center', color='darkgreen')

plt.text(df.date[t], df.traffic[t]-35, df.date[t], horizontalalignment='center', color='darkred')#Decoration

plt.ylim(50,750)

xtick_location= df.index.tolist()[::6]

xtick_labels= df.date.tolist()[::6]

plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.7)

plt.title("Peak and Troughs of Air Passengers Traffic (1949 - 1969)", fontsize=22)

plt.yticks(fontsize=12, alpha=.7)#Lighten borders

plt.gca().spines["top"].set_alpha(.0)

plt.gca().spines["bottom"].set_alpha(.3)

plt.gca().spines["right"].set_alpha(.0)

plt.gca().spines["left"].set_alpha(.3)

plt.legend(loc='upper left')

plt.grid(axis='y', alpha=.3)

plt.show()

3、自相关(ACF)和部分自相关(PACF)图

ACF图显示了时间序列与其自身滞后的相关性。每条垂直线(在自相关图上)代表序列与从滞后0开始的滞后之间的相关性。图中的蓝色阴影区域是显着性水平。蓝线以上的那些滞后就是巨大的滞后。

那么如何解释呢？

对于AirPassengers，我们看到多达14个滞后已越过蓝线，因此意义重大。这意味着，距今已有14年之久的航空客运量对今天的客运量产生了影响。

另一方面，PACF显示了任何给定的(时间序列)滞后与当前序列之间的自相关，但是去除了两者之间的滞后。

#Import Data

df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")

x= df['date']

y1= df['psavert']

y2= df['unemploy']#Plot Line1 (Left Y Axis)

fig, ax1 = plt.subplots(1,1,figsize=(16,9), dpi= 80)

ax1.plot(x, y1, color='tab:red')#Plot Line2 (Right Y Axis)

ax2 = ax1.twinx() #instantiate a second axes that shares the same x-axis

ax2.plot(x, y2, color='tab:blue')#Decorations#ax1 (left Y axis)

ax1.set_xlabel('Year', fontsize=20)

ax1.tick_params(axis='x', rotation=0, labelsize=12)

ax1.set_ylabel('Personal Savings Rate', color='tab:red', fontsize=20)

ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red')

ax1.grid(alpha=.4)#ax2 (right Y axis)

ax2.set_ylabel("# Unemployed (1000's)", color='tab:blue', fontsize=20)

ax2.tick_params(axis='y', labelcolor='tab:blue')

ax2.set_xticks(np.arange(0, len(x),60))

ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10})

ax2.set_title("Personal Savings Rate vs Unemployed: Plotting in Secondary Y Axis", fontsize=22)

fig.tight_layout()

plt.show()

4、交叉相关图

互相关图显示了两个时间序列之间的时滞。

from scipy.stats importsem#Import Data

df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/user_orders_hourofday.csv")

df_mean= df.groupby('order_hour_of_day').quantity.mean()

df_se= df.groupby('order_hour_of_day').quantity.apply(sem).mul(1.96)#Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.ylabel("# Orders", fontsize=16)

x=df_mean.index

plt.plot(x, df_mean, color="white", lw=2)

plt.fill_between(x, df_mean- df_se, df_mean + df_se, color="#3F5D7D")#Decorations#Lighten borders

plt.gca().spines["top"].set_alpha(0)

plt.gca().spines["bottom"].set_alpha(1)

plt.gca().spines["right"].set_alpha(0)

plt.gca().spines["left"].set_alpha(1)

plt.xticks(x[::2], [str(d) for d in x[::2]] , fontsize=12)

plt.title("User Orders by Hour of Day (95% confidence)", fontsize=22)

plt.xlabel("Hour of Day")

s, e=plt.gca().get_xlim()

plt.xlim(s, e)#Draw Horizontal Tick lines

for y in range(8, 20, 2):

plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

5、时间序列分解图

时间序列分解图显示了时间序列按趋势，季节和残差成分的分解。

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"

from dateutil.parser importparsefrom scipy.stats importsem#Import Data

df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv',

parse_dates=['purchase_time', 'purchase_date'])#Prepare Data: Daily Mean and SE Bands

df_mean = df_raw.groupby('purchase_date').quantity.mean()

df_se= df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)#Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.ylabel("# Daily Orders", fontsize=16)

x= [d.date().strftime('%Y-%m-%d') for d indf_mean.index]

plt.plot(x, df_mean, color="white", lw=2)

plt.fill_between(x, df_mean- df_se, df_mean + df_se, color="#3F5D7D")#Decorations#Lighten borders

plt.gca().spines["top"].set_alpha(0)

plt.gca().spines["bottom"].set_alpha(1)

plt.gca().spines["right"].set_alpha(0)

plt.gca().spines["left"].set_alpha(1)

plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)

plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)#Axis limits

s, e =plt.gca().get_xlim()

plt.xlim(s, e-2)

plt.ylim(4, 10)#Draw Horizontal Tick lines

for y in range(5, 10, 1):

plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

6、多时间序列图

您可以在同一张图表上绘制测量同一值的多个时间序列，如下所示。

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"

from dateutil.parser importparsefrom scipy.stats importsem#Import Data

df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv',

parse_dates=['purchase_time', 'purchase_date'])#Prepare Data: Daily Mean and SE Bands

df_mean = df_raw.groupby('purchase_date').quantity.mean()

df_se= df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)#Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.ylabel("# Daily Orders", fontsize=16)

x= [d.date().strftime('%Y-%m-%d') for d indf_mean.index]

plt.plot(x, df_mean, color="white", lw=2)

plt.fill_between(x, df_mean- df_se, df_mean + df_se, color="#3F5D7D")#Decorations#Lighten borders

plt.gca().spines["top"].set_alpha(0)

plt.gca().spines["bottom"].set_alpha(1)

plt.gca().spines["right"].set_alpha(0)

plt.gca().spines["left"].set_alpha(1)

plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)

plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)#Axis limits

s, e =plt.gca().get_xlim()

plt.xlim(s, e-2)

plt.ylim(4, 10)#Draw Horizontal Tick lines

for y in range(5, 10, 1):

plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

7、双y轴图

如果要显示在同一时间点测量两个不同量的两个时间序列，则可以在右边的第二个Y轴上再次绘制第二个序列。

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"

from dateutil.parser importparsefrom scipy.stats importsem#Import Data

df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv',

parse_dates=['purchase_time', 'purchase_date'])#Prepare Data: Daily Mean and SE Bands

df_mean = df_raw.groupby('purchase_date').quantity.mean()

df_se= df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)#Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.ylabel("# Daily Orders", fontsize=16)

x= [d.date().strftime('%Y-%m-%d') for d indf_mean.index]

plt.plot(x, df_mean, color="white", lw=2)

plt.fill_between(x, df_mean- df_se, df_mean + df_se, color="#3F5D7D")#Decorations#Lighten borders

plt.gca().spines["top"].set_alpha(0)

plt.gca().spines["bottom"].set_alpha(1)

plt.gca().spines["right"].set_alpha(0)

plt.gca().spines["left"].set_alpha(1)

plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)

plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)#Axis limits

s, e =plt.gca().get_xlim()

plt.xlim(s, e-2)

plt.ylim(4, 10)#Draw Horizontal Tick lines

for y in range(5, 10, 1):

plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

8、具有误差带的时间序列

如果您具有每个时间点(日期/时间戳)具有多个观测值的时间序列数据集，则可以构建带有误差带的时间序列。您可以在下面看到一些基于一天中不同时间下达的订单的示例。另一个例子是在45天的时间内到达的订单数量。

在这种方法中，订单数量的平均值由白线表示。然后计算出95％的置信带并围绕均值绘制。

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"

from dateutil.parser importparsefrom scipy.stats importsem#Import Data

df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv',

parse_dates=['purchase_time', 'purchase_date'])#Prepare Data: Daily Mean and SE Bands

df_mean = df_raw.groupby('purchase_date').quantity.mean()

df_se= df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)#Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.ylabel("# Daily Orders", fontsize=16)

x= [d.date().strftime('%Y-%m-%d') for d indf_mean.index]

plt.plot(x, df_mean, color="white", lw=2)

plt.fill_between(x, df_mean- df_se, df_mean + df_se, color="#3F5D7D")#Decorations#Lighten borders

plt.gca().spines["top"].set_alpha(0)

plt.gca().spines["bottom"].set_alpha(1)

plt.gca().spines["right"].set_alpha(0)

plt.gca().spines["left"].set_alpha(1)

plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)

plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)#Axis limits

s, e =plt.gca().get_xlim()

plt.xlim(s, e-2)

plt.ylim(4, 10)#Draw Horizontal Tick lines

for y in range(5, 10, 1):

plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"

from dateutil.parser importparsefrom scipy.stats importsem#Import Data

df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv',

parse_dates=['purchase_time', 'purchase_date'])#Prepare Data: Daily Mean and SE Bands

df_mean = df_raw.groupby('purchase_date').quantity.mean()

df_se= df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)#Plot

plt.figure(figsize=(16,10), dpi= 80)

plt.ylabel("# Daily Orders", fontsize=16)

x= [d.date().strftime('%Y-%m-%d') for d indf_mean.index]

plt.plot(x, df_mean, color="white", lw=2)

plt.fill_between(x, df_mean- df_se, df_mean + df_se, color="#3F5D7D")#Decorations#Lighten borders

plt.gca().spines["top"].set_alpha(0)

plt.gca().spines["bottom"].set_alpha(1)

plt.gca().spines["right"].set_alpha(0)

plt.gca().spines["left"].set_alpha(1)

plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)

plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)#Axis limits

s, e =plt.gca().get_xlim()

plt.xlim(s, e-2)

plt.ylim(4, 10)#Draw Horizontal Tick lines