最近在论文中遇到的,大概就是根据往年的年份-文献数量(图大概长下面这样)预测新一年的文献数量。
首先分析问题:1、这是单变量预测,变量是年份,因变量是发文量。2、Y-x不是线性关系。
偶然想到大二数据结构课学过java的时间序列预测,于是以下为python的单变量时间序列预测学习笔记:
时间序列(time series)是同一现象在不同时间上的相继观察值排列而成的序列。根据观察时间的不同,时间序列中的时间可以是可以是年份、季度、月份或其他任何时间形式。
研究时间序列主要目的:进行预测,根据已有的时间序列数据预测未来的变化。
时间序列预测关键:确定已有的时间序列的变化模式,并假定这种模式会延续到未来。
特点:
- 假设事物发展趋势会延伸到未来
- 预测所依据的数据具有不规则性
- 不考虑事物发展之间的因果关系
平稳序列(stationary series):是基本上不存在趋势的序列,序列中的各观察值基本上在某个固定的水平上波动,在不同时间段波动程度不同,但不存在某种规律,随机波动
非平稳序列(non-stationary series):是包含趋势、季节性或周期性的序列,只含有其中一种成分,也可能是几种成分的组合。可分为:有趋势序列、有趋势和季节性序列、几种成分混合而成的复合型序列。
详细见:时间序列分析和预测(含实例及代码)_我不爱机器学习的博客-CSDN博客_时间序列预测
①Exponential Smoothing:指数平滑法(较为简单的时间序列预测问题)
思路:离预测时间越近的点,作用越大(我要预测2022年的值,2021年的数据肯定比2011年的数据对结果影响更大)
基本思想:假设随着时间变化权重以指数方式下降——最近为0.1,然后0.1^2,0.1^3…(将权重按照指数级进行衰减),最终年代久远的数据权重将接近于0。
一次指数平滑法针对没有趋势和季节性的序列,二次指数平滑法针对有趋势但没有季节性的序列,三次指数平滑法针对有趋势也有季节性的序列。
②ARIMA
③LSTM神经网络:对于复杂的时间序列预测问题,LSTM是一种很好的选择。
即时间序列预测问题转换为传统的监督学习问题(时间窗方法)
而对没有明显阶段规律的y-x值,还可以用多阶拟合的方式:(y-x的取值可以手动输入或文件读取)这里参考的是:数学建模方法—【04】拟合方法之np.polyfit、np.poly1d_土豆同学的博客-CSDN博客_np.poly1d
import numpy as np
import matplotlib.pyplot as plt
x_arr = [1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,
2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021]
y_arr = [26,51,88,123,144,170,224,282,322,343,382,428,481,527,576,619,668,726,774,824,875,902,943,998,1043,1092,1148,
1193,1259,1322,1393,1462,1540,1624,1714,1823,1903,1992,2060,2134]
# 拟合数据集
def goodness_of_fit(y_fitting, y_no_fitting):
"""
计算拟合优度R^2
:param y_fitting: List[int] or array[int] 拟合好的y值
:param y_no_fitting: List[int] or array[int] 待拟合y值
:return: 拟合优度R^2
"""
SSR = __ssr(y_fitting, y_no_fitting)
SST = __sst(y_no_fitting)
rr = SSR /SST
return rr
def __ssr(y_fitting, y_no_fitting):
"""
计算SSR(regression sum of squares) 回归平方和
:param y_fitting: List[int] or array[int] 拟合好的y值
:param y_no_fitting: List[int] or array[int] 待拟合y值
:return: 回归平方和SSR
"""
y_mean = sum(y_no_fitting) / len(y_no_fitting)
s_list =[(y - y_mean)**2 for y in y_fitting]
ssr = sum(s_list)
return ssr
def __sst(y_no_fitting):
"""
计算SST(total sum of squares) 总平方和
:param y_no_predicted: List[int] or array[int] 待拟合的y
:return: 总平方和SST
"""
y_mean = sum(y_no_fitting) / len(y_no_fitting)
s_list =[(y - y_mean)**2 for y in y_no_fitting]
sst = sum(s_list)
return sst
figure1 = plt.figure(figsize=(8, 6))
# coeff 为系数,poly_fit 拟合函数
# 1. 先拟合获取系数
coeff_1 = np.polyfit(x_arr, y_arr, 1)
print("一阶拟合系数为:", coeff_1)
# 2. 根据系数得到多项式
poly_fit_1 = np.poly1d(coeff_1)
print("一阶多项式为:", poly_fit_1)
# 3. 输入变量(单个值或者变量数组),得到拟合结果(数组)
y_fit_1 = poly_fit_1(x_arr)
print("一阶拟合得到的数据为: ", y_fit_1)
# 4. 根据结果作图
plt.plot(x_arr, y_fit_1, 'green',label="一阶拟合")
# 5. 根据原始数据以及拟合数据得到拟合优度
rr1 = goodness_of_fit(y_fit_1, y_arr)
print("一阶拟合优度为%.5f" % rr1)
coeff_2 = np.polyfit(x_arr, y_arr, 2)
print("二阶拟合系数为:", coeff_2)
poly_fit_2 = np.poly1d(coeff_2)
print("二阶多项式为:", poly_fit_2)
y_fit_2 = poly_fit_2(x_arr)
print("二阶拟合得到的数据为: ", y_fit_2)
plt.plot(x_arr, y_fit_2, 'orange',label="二阶拟合")
rr2 = goodness_of_fit(y_fit_2, y_arr)
print("二阶拟合优度为%.5f" % rr2)
coeff_3 = np.polyfit(x_arr, y_arr, 3)
print("三阶拟合系数为:", coeff_3)
poly_fit_3 = np.poly1d(coeff_3)
print("三阶多项式为:", poly_fit_3)
y_fit_3 = poly_fit_3(x_arr)
print("三阶拟合得到的数据为: ", y_fit_3)
plt.plot(x_arr, poly_fit_3(x_arr), 'skyblue',label="三阶拟合")
rr3 = goodness_of_fit(y_fit_3, y_arr)
print("三阶拟合优度为%.5f" % rr3)
coeff_4 = np.polyfit(x_arr, y_arr, 4)
print("四阶拟合系数为:", coeff_4)
poly_fit_4 = np.poly1d(coeff_4)
print("四阶多项式为:", poly_fit_4)
y_fit_4 = poly_fit_4(x_arr)
print("四阶拟合得到的数据为: ", y_fit_4)
plt.plot(x_arr, y_fit_4, 'blue',label="四阶拟合")
rr4 = goodness_of_fit(y_fit_4, y_arr)
print("四阶拟合优度为%.5f" % rr4)
coeff_5 = np.polyfit(x_arr, y_arr, 5)
print("五阶拟合系数为:", coeff_5)
poly_fit_5 = np.poly1d(coeff_5)
print("五阶多项式为:", poly_fit_5)
y_fit_5 = poly_fit_5(x_arr)
print("五阶拟合得到的数据为: ", y_fit_5)
plt.plot(x_arr, y_fit_5, 'red', label="五阶拟合")
rr5 = goodness_of_fit(y_fit_5, y_arr)
print("五阶拟合优度为%.5f" % rr5)
coeff_6 = np.polyfit(x_arr, y_arr, 6)
print("六阶拟合系数为:", coeff_6)
poly_fit_6 = np.poly1d(coeff_6)
print("六阶多项式为:", poly_fit_6)
y_fit_6 = poly_fit_6(x_arr)
print("六阶拟合得到的数据为: ", y_fit_6)
plt.plot(x_arr, y_fit_6, 'pink', label="六阶拟合")
rr6 = goodness_of_fit(y_fit_6, y_arr)
print("六阶拟合优度为%.5f" % rr6)
plt.scatter(x_arr, y_arr, color='black', label="原始数据")
plt.title("1~6阶拟合曲线图")
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
plt.legend(loc=2)
plt.show()
prediction = poly_fit_6(2022)-poly_fit_6(2021)
print("六阶拟合预测2022年文献数量为%.5f"% prediction)
运行结果:
一阶拟合系数为: [ 5.25639775e+01 -1.04301851e+05]
一阶多项式为:
52.56 x - 1.043e+05
一阶拟合得到的数据为: [-120.04756098 -67.48358349 -14.919606 37.64437148 90.20834897
142.77232645 195.33630394 247.90028143 300.46425891 353.0282364
405.59221388 458.15619137 510.72016886 563.28414634 615.84812383
668.41210131 720.9760788 773.54005629 826.10403377 878.66801126
931.23198874 983.79596623 1036.35994371 1088.9239212 1141.48789869
1194.05187617 1246.61585366 1299.17983114 1351.74380863 1404.30778612
1456.8717636 1509.43574109 1561.99971857 1614.56369606 1667.12767355
1719.69165103 1772.25562852 1824.819606 1877.38358349 1929.94756098]
一阶拟合优度为0.97997
二阶拟合系数为: [ 6.81121189e-01 -2.67396414e+03 2.62418041e+06]
二阶多项式为: 2
0.6811 x - 2674 x + 2.624e+06
二阶拟合得到的数据为: [ 48.18937282 74.87074511 102.91435977 132.32021682 163.08831624
195.21865804 228.71124222 263.56606878 299.78313772 337.36244904
376.30400274 416.60779881 458.27383727 501.3021181 545.69264131
591.4454069 638.56041487 687.03766522 736.87715795 788.07889306
840.64287054 894.56909041 949.85755265 1006.50825727 1064.52120428
1123.89639366 1184.63382542 1246.73349956 1310.19541607 1375.01957497
1441.20597624 1508.7546199 1577.66550593 1647.93863434 1719.57400513
1792.5716183 1866.93147385 1942.65357178 2019.73791209 2098.18449477]
二阶拟合优度为0.99748
三阶拟合系数为: [ 1.98129161e-02 -1.18285534e+02 2.35433047e+05 -1.56226545e+08]
三阶多项式为: 3 2
0.01981 x - 118.3 x + 2.354e+05 x - 1.562e+08
三阶拟合得到的数据为: [ -6.13169953 37.26384878 79.82240573 121.66284898 162.9040558
203.66490385 244.06427062 284.22103357 324.2540701 364.28225783
404.4244743 444.79959694 485.52650312 526.72407046 568.51117656
611.00669873 654.3295145 698.5985015 743.93253717 790.45049888
838.2712642 887.51371077 938.29671589 990.73915705 1044.95991194
1101.07785794 1159.21187252 1219.48083317 1282.00361753 1346.89910293
1414.28616691 1484.28368694 1557.01054066 1632.58560538 1711.12775871
1792.75587821 1877.58884126 1965.74552527 2057.34480789 2152.50556663]
三阶拟合优度为0.99899
四阶拟合系数为: [ 9.12416351e-04 -7.28499239e+00 2.18122542e+04 -2.90263846e+07
1.44850306e+10]
四阶多项式为: 4 3 2
0.0009124 x - 7.285 x + 2.181e+04 x - 2.903e+07 x + 1.449e+10
四阶拟合得到的数据为: [ 19.59877014 49.79920769 82.28782463 116.78931808 153.05028915
190.83926582 229.94665527 270.1847496 311.38776588 353.41180229
396.13485718 439.45681572 483.29950142 527.60658264 572.34366035
617.49823189 663.07965851 709.11925125 755.67019272 802.80756187
850.62832451 899.25137138 948.81747055 999.48929596 1051.45144081
1104.91033363 1160.09437561 1217.25382423 1276.66083717 1338.60948181
1403.41569901 1471.41738892 1542.97425842 1618.46797943 1698.30212593
1782.90210533 1872.71530533 1968.21094704 2069.88017082 2178.23604012]
四阶拟合优度为0.99931
五阶拟合系数为: [-7.74229726e-05 7.75722798e-01 -3.10881662e+03 6.22939054e+06
-6.24105435e+09 2.50104990e+12]
五阶多项式为: 5 4 3 2
-7.742e-05 x + 0.7757 x - 3109 x + 6.229e+06 x - 6.241e+09 x + 2.501e+12
五阶拟合得到的数据为: [ 40.81640625 54.6875 76.86376953 105.76953125 140.00830078
178.37158203 219.80029297 263.40722656 308.44140625 354.29882812
400.49658203 446.67480469 492.58496094 538.07714844 583.09228516
627.65771484 671.86816406 715.88623047 759.92919922 804.25634766
849.1640625 894.97558594 942.03417969 990.68505859 1041.27636719
1094.14404297 1149.60693359 1207.95166016 1269.42529297 1334.23095703
1402.51123047 1474.34765625 1549.73583984 1628.59863281 1710.75634766
1795.92480469 1883.71875 1973.61474609 2064.97070312 2156.99902344]
五阶拟合优度为0.99954
六阶拟合系数为: [-1.94414657e-08 1.55849276e-04 -3.90500377e-01 7.13376681e-01
1.56573864e+06 -2.51068360e+09 1.25778866e+12]
六阶多项式为: 6 5 4 3 2
-1.944e-08 x + 0.0001558 x - 0.3905 x + 0.7134 x + 1.566e+06 x - 2.511e+09 x + 1.258e+12
六阶拟合得到的数据为: [ 40.82397461 54.70751953 76.88330078 105.77929688 140.01049805
178.36010742 219.78320312 263.38476562 308.42089844 354.2800293
400.484375 446.67211914 492.59228516 538.09375 583.11865234
627.69018555 671.90673828 715.92724609 759.96850586 804.2902832
849.19140625 894.99316406 942.03955078 990.67749023 1041.25830078
1094.11621094 1149.57177734 1207.91088867 1269.38671875 1334.19848633
1402.49121094 1474.34130859 1549.7512207 1628.63427734 1710.81054688
1795.99584961 1883.7878418 1973.66503906 2064.96948242 2156.90185547]
六阶拟合优度为0.99955
六阶拟合预测2022年文献数量为91.60547