用Prophet在Python中进行时间序列预测

使用fbprophet分析世界疫情感染人数

import pandas as pd
from fbprophet import Prophet
pred = pd.read_csv("../kaggle3/covid-19-all.csv")
pred.head()
Country/RegionProvince/StateLatitudeLongitudeConfirmedRecoveredDeathsDate
0ChinaAnhui31.8257117.22641.0NaNNaN2020-01-22
1ChinaBeijing40.1824116.414214.0NaNNaN2020-01-22
2ChinaChongqing30.0572107.87406.0NaNNaN2020-01-22
3ChinaFujian26.0789117.98741.0NaNNaN2020-01-22
4ChinaGansu37.8099101.0583NaNNaNNaN2020-01-22
pred = pred.fillna(0)
predgrp = pred.groupby("Date")[["Confirmed","Recovered","Deaths"]].sum().reset_index()
predgrp.head()
DateConfirmedRecoveredDeaths
02020-01-22555.028.017.0
12020-01-23653.030.018.0
22020-01-24941.036.026.0
32020-01-251438.039.042.0
42020-01-262118.052.056.0

观察前面五行,发现单日新增感染人数只有几百,再观察最近的感染人数,达到了恐怖的18万人之多!

predgrp.tail()
DateConfirmedRecoveredDeaths
872020-04-182317759.0592319.0159510.0
882020-04-192401379.0623903.0165044.0
892020-04-202472259.0645738.0169986.0
902020-04-212549294.0679819.0176583.0
912020-04-222623415.0709694.0183027.0
pred_cnfrm = predgrp.loc[:,["Date","Confirmed"]]
pred_cnfrm.shape
(92, 2)
pr_data = pred_cnfrm
pr_data.columns = ['ds','y']
pr_data.head()
m=Prophet()
m.fit(pr_data)
future=m.make_future_dataframe(periods=15)   #预测15天
forecast=m.predict(future)
forecast.head().T
01234
ds2020-01-22 00:00:002020-01-23 00:00:002020-01-24 00:00:002020-01-25 00:00:002020-01-26 00:00:00
trend-1231.05-218.877793.2961805.473092.11
trend_lower-1231.05-218.877793.2961805.473092.11
trend_upper-1231.05-218.877793.2961805.473092.11
yhat_lower-11158.3-8236.24-3166.07247.023-982.845
yhat_upper1602.525214.4610051.412530.211597.8
additive_terms-3974.05-571.1372528.974295.442357.41
additive_terms_lower-3974.05-571.1372528.974295.442357.41
additive_terms_upper-3974.05-571.1372528.974295.442357.41
multiplicative_terms00000
multiplicative_terms_lower00000
multiplicative_terms_upper00000
weekly-3974.05-571.1372528.974295.442357.41
weekly_lower-3974.05-571.1372528.974295.442357.41
weekly_upper-3974.05-571.1372528.974295.442357.41
yhat-5205.09-790.0143322.276100.915449.52
y_ped = forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
y_ped.head()
dsyhatyhat_loweryhat_upper
02020-01-22-5205.094924-11158.3495331602.523589
12020-01-23-790.014418-8236.2378595214.464592
22020-01-243322.270436-3166.06591910051.441647
32020-01-256100.908463247.02275012530.151876
42020-01-265449.521809-982.84495011597.844552
m.plot(forecast,xlabel='Date',ylabel='Confirmed Count', uncertainty=True)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OfI7vcp6-1587795180346)(output_9_0.png)]

这里是区间预测,但是由于数据比较大,所以看到点线近似重合了,所以认为误差可以被接受。

如果想查看预测的成分分析,可以使用 Prophet.plot_components 方法。
默认情况下,将展示趋势、时间序列的年度季节性和周季节性。 如果之前包含了节假日,也会展示出来。

m.plot_components(forecast)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W6y6Q3iE-1587795180350)(output_12_0.png)]

误差分析:主要看绝对值误差,因为均方误差一般针对回归问题。

用sklearn库的内置函数计算误差
from sklearn.metrics import mean_squared_error # 均方误差
from sklearn.metrics import mean_absolute_error # 平方绝对误差
y = y_ped["yhat"].loc[0:91]

观察前几个数据的预测效果,发现差异比较大。

(y-pred_cnfrm["Confirmed"]).head()
0   -5760.094924
1   -1443.014418
2    2381.270436
3    4662.908463
4    3331.521809
dtype: float64
MSE(Mean Squared Error)均方误差
mean_squared_error(pred_cnfrm["Confirmed"],y)
24838490.129674848
MAE (Mean absolute Error)平均绝对误差
mean_absolute_error(pred_cnfrm["Confirmed"],y)
3423.4048112033311
利用numpy计算误差
MSE(Mean Squared Error)均方误差
import numpy as np
mse_test=np.sum((y-pred_cnfrm["Confirmed"])**2)/len(y) #跟数学公式一样的
mse_test
24838490.129674848
RMSE(Root Mean Squared Error)均方根误差
rmse_test=mse_test ** 0.5
rmse_test
4983.8228429263863
MAE (Mean absolute Error)平均绝对误差
mae_test=np.sum(abs(y-pred_cnfrm["Confirmed"]))/len(y) #跟数学公式一样的
mae_test
3423.4048112033311
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值