时间序列分析——基于R(第2版)
时间序列分析初步:1884-1939年英格兰与威尔士每亩小麦产量数据集
Year | yields |
1884 | 15.2 |
1885 | 16.9 |
1886 | 15.3 |
1887 | 14.9 |
1888 | 15.7 |
1889 | 15.1 |
1890 | 16.7 |
1891 | 16.3 |
1892 | 16.5 |
1893 | 13.3 |
1894 | 16.5 |
1895 | 15 |
1896 | 15.9 |
1897 | 15.5 |
1898 | 16.9 |
1899 | 16.4 |
1900 | 14.9 |
1901 | 14.5 |
1902 | 16.6 |
1903 | 15.1 |
1904 | 14.6 |
1905 | 16 |
1906 | 16.8 |
1907 | 16.8 |
1908 | 15.5 |
1909 | 17.3 |
1910 | 15.5 |
1911 | 15.5 |
1912 | 14.2 |
1913 | 15.8 |
1914 | 15.7 |
1915 | 14.1 |
1916 | 14.8 |
1917 | 14.4 |
1918 | 15.6 |
1919 | 13.9 |
1920 | 14.7 |
1921 | 14.3 |
1922 | 14 |
1923 | 14.5 |
1924 | 15.4 |
1925 | 15.3 |
1926 | 16 |
1927 | 16.4 |
1928 | 17.2 |
1929 | 17.8 |
1930 | 14.4 |
1931 | 15 |
1932 | 16 |
1933 | 16.8 |
1934 | 16.9 |
1935 | 16.6 |
1936 | 16.2 |
1937 | 14 |
1938 | 18.1 |
1939 | 17.5 |
1.生成时间序列数据
> #导入库
> library(readxl)
> library(ggplot2)
> library(reshape2)
> library(xlsx)
> #读取E2.2.xlsx表格中的E2.2数据
> a1 <- read_excel(r"(D:\\2022-2023春季学期课程资料\\时间序列分析\\时间序列分析——基于R(第2版)案例数据\\A1_1.xlsx)", sheet = 1);a1
# A tibble: 56 × 2
Year yields
<dbl> <dbl>
1 1884 15.2
2 1885 16.9
3 1886 15.3
4 1887 14.9
5 1888 15.7
6 1889 15.1
7 1890 16.7
8 1891 16.3
9 1892 16.5
10 1893 13.3
# … with 46 more rows
# ℹ Use `print(n = ...)` to see more rows
> #进行数据绑定
> attach(a1)
1.1数据概要统计
> #进行数据概要统计
> summary(a1)
Year yields
Min. : 183 Min. :13.30
1st Qu.: 1893 1st Qu.:14.88
Median : 4383 Median :15.55
Mean : 5758 Mean :15.66
3rd Qu.: 9405 3rd Qu.:16.52
Max. :14427 Max. :18.10
1.2时间序列变量指定
调用ts函数,指定yields为时间序列变量。
start:指定序列的起始读入时间。
frequency:指定序列每年读入的数据频率。年度:1;季度:4;月度:12;周度:52。
> #指定yields为时间序列变量
> yields <- ts(yields, start = c(1884), frequency = 1)
2.时间序列数据处理
2.1序列变换
> #对时间序列变量yields进行对数变换
> log_yields <- log(yields);log_yields
Time Series:
Start = 1884
End = 1939
Frequency = 1
[1] 2.721295 2.827314 2.727853 2.701361 2.753661 2.714695 2.815409 2.791165
[9] 2.803360 2.587764 2.803360 2.708050 2.766319 2.740840 2.827314 2.797281
[17] 2.701361 2.674149 2.809403 2.714695 2.681022 2.772589 2.821379 2.821379
[25] 2.740840 2.850707 2.740840 2.740840 2.653242 2.760010 2.753661 2.646175
[33] 2.694627 2.667228 2.747271 2.631889 2.687847 2.660260 2.639057 2.674149
[41] 2.734368 2.727853 2.772589 2.797281 2.844909 2.879198 2.667228 2.708050
[49] 2.772589 2.821379 2.827314 2.809403 2.785011 2.639057 2.895912 2.862201
2.2生成子序列
> #提取子序列
> yields2 <- window(yields, start = c(1900), end = c(1920));yields2
Time Series:
Start = 1900
End = 1920
Frequency = 1
[1] 14.9 14.5 16.6 15.1 14.6 16.0 16.8 16.8 15.5 17.3 15.5 15.5 14.2 15.8
[15] 15.7 14.1 14.8 14.4 15.6 13.9 14.7
2.3缺失值插值
缺失值用NA表示,即not available,无法获得的数据。
> #将yields序列第五个观察值定义为缺失值
> a <- yields
> a[5] <- NA;a
Time Series:
Start = 1884
End = 1939
Frequency = 1
[1] 15.2 16.9 15.3 14.9 NA 15.1 16.7 16.3 16.5 13.3 16.5 15.0 15.9 15.5
[15] 16.9 16.4 14.9 14.5 16.6 15.1 14.6 16.0 16.8 16.8 15.5 17.3 15.5 15.5
[29] 14.2 15.8 15.7 14.1 14.8 14.4 15.6 13.9 14.7 14.3 14.0 14.5 15.4 15.3
[43] 16.0 16.4 17.2 17.8 14.4 15.0 16.0 16.8 16.9 16.6 16.2 14.0 18.1 17.5
> #调用zoo包,使用na.approx函数对缺失值进行线性插值
> library(zoo)
> y1 <- na.approx(a);y1
Time Series:
Start = 1884
End = 1939
Frequency = 1
[1] 15.2 16.9 15.3 14.9 15.0 15.1 16.7 16.3 16.5 13.3 16.5 15.0 15.9 15.5
[15] 16.9 16.4 14.9 14.5 16.6 15.1 14.6 16.0 16.8 16.8 15.5 17.3 15.5 15.5
[29] 14.2 15.8 15.7 14.1 14.8 14.4 15.6 13.9 14.7 14.3 14.0 14.5 15.4 15.3
[43] 16.0 16.4 17.2 17.8 14.4 15.0 16.0 16.8 16.9 16.6 16.2 14.0 18.1 17.5
> #使用na.spline函数对缺失值进行样条插值
> y2 <- na.spline(a);y2
Time Series:
Start = 1884
End = 1939
Frequency = 1
[1] 15.20000 16.90000 15.30000 14.90000 14.56629 15.10000 16.70000 16.30000
[9] 16.50000 13.30000 16.50000 15.00000 15.90000 15.50000 16.90000 16.40000
[17] 14.90000 14.50000 16.60000 15.10000 14.60000 16.00000 16.80000 16.80000
[25] 15.50000 17.30000 15.50000 15.50000 14.20000 15.80000 15.70000 14.10000
[33] 14.80000 14.40000 15.60000 13.90000 14.70000 14.30000 14.00000 14.50000
[41] 15.40000 15.30000 16.00000 16.40000 17.20000 17.80000 14.40000 15.00000
[49] 16.00000 16.80000 16.90000 16.60000 16.20000 14.00000 18.10000 17.50000
3.绘制时序图
3.1默认格式时序图
纵轴表示序列取值,横轴表示时间。时间间隔由R语言根据数据量的多少自行选择时间输出间隔。
> #绘制时序图
> plot(yields, type = "o")
![](https://i-blog.csdnimg.cn/blog_migrate/ebfdcf54e61878dc392a620d87d99521.png)
3.2点线类型设置
参数取值 | 描述 |
p | 点 |
l | 线 |
b | 点连线 |
o | 线穿过点 |
h | 悬垂线 |
s | 阶梯线 |
> #绘制各种点线类型的时序图,以3行2列的方式输出
> par(mfrow = c(3,2))
> plot(yields, type = "p", main = 'type="p"')
> plot(yields, type = "l", main = 'type="l"')
> plot(yields, type = "b", main = 'type="b"')
> plot(yields, type = "o", main = 'type="o"')
> plot(yields, type = "h", main = 'type="h"')
> plot(yields, type = "s", main = 'type="s"')
![](https://i-blog.csdnimg.cn/blog_migrate/6a61206f818e47f4c6607a40a206ef0e.png)
3.3符号参数设置
![](https://i-blog.csdnimg.cn/blog_migrate/53a103b469bada082a5c18877646ae7c.png)
> #绘制各种符号类型的时序图,以2行2列的方式输出
> par(mfrow = c(2,2))
> plot(yields, type = "o", pch = 1, main = 'pch = 1')
> plot(yields, type = "o", pch = 8, main = 'pch = 8')
> plot(yields, type = "o", pch = 16, main = 'pch = 16')
> plot(yields, type = "o", pch = 24, main = 'pch = 24')
![](https://i-blog.csdnimg.cn/blog_migrate/5da19ea1ec11e6a0f455188a10f1ad20.png)
3.4连线类型设置
![](https://i-blog.csdnimg.cn/blog_migrate/42659bab2ac778e98d757d0f4e6cea37.png)
> #绘制各种连线类型的时序图,以1行2列的方式输出
> par(mfrow = c(1,2))
> plot(yields, lty = 1, main = 'lty = 1')
> plot(yields, lty = 2, main = 'lty = 2')
![](https://i-blog.csdnimg.cn/blog_migrate/0dba026672fa08e483e42a9b763519ff.png)
3.5线条宽度设置
参数取值 | 描述 |
lwd=1 | 默认宽度 |
lwd=k | 默认宽度的k倍 |
lwd=-k | 默认宽度的1/k倍 |
> #绘制不同线宽的时序图,以1行2列的方式输出
> par(mfrow = c(1,2))
> plot(yields, lwd = 1, main = 'lwd = 1')
> plot(yields, lwd = 4, main = 'lwd = 4')
![](https://i-blog.csdnimg.cn/blog_migrate/43748f9adf13348ca082148bd346a8a5.png)
3.6颜色参数设置
> #绘制不同颜色的时序图,以2行2列的方式输出
> par(mfrow = c(2,2))
> plot(yields, col = 1, main = 'col = 1')
> plot(yields, col = 2, main = 'col = 2')
> plot(yields, col = 3, main = 'col = 3')
> plot(yields, col = 4, main = 'col = 4')
![](https://i-blog.csdnimg.cn/blog_migrate/1072a80ebfcd7fd324a086dca4564f03.png)
3.7文本显示设置
> #绘制添加文本时序图,以一页一图的方式输出
> par(mfrow = c(1,1))
> plot(yields, type = "o", pch = 4, lty = 1, lwd = 2, col = 4, main = '1884-1939年英格兰与威尔士每亩小麦产量时序图', xlab = "年份", ylab = "产量")
![](https://i-blog.csdnimg.cn/blog_migrate/f97a995c2ff9c59964b014f69f1e064e.png)
3.8坐标轴范围设置
> #分别指定横坐标范围和纵坐标范围绘制时序图,以2行2列的方式输出
> par(mfrow = c(2,2))
> plot(yields, xlim = c(1885,1900), main = "横轴范围 1885-1900")
> plot(yields, xlim = c(1885,1920), main = "横轴范围 1885-1920")
> plot(yields, ylim = c(13,16), main = "纵轴范围 13-16")
> plot(yields, ylim = c(16,19), main = "纵轴范围 16-19")
![](https://i-blog.csdnimg.cn/blog_migrate/1b2cd6e3c1f1c3fdb46029c9ab593f30.png)
3.9添加参照线
> #绘制带参照线的文本时序图,参照线以虚线表示,以一页一图的方式输出
> par(mfrow = c(1,1))
> plot(yields, type = "o", pch = 4, lty = 1, lwd = 1, col = 4,
+ main = '1884-1939年英格兰与威尔士每亩小麦产量时序图',
+ xlab = "年份", ylab = "产量")
> abline(v = 1910, h = 15.55 ,lty = 2, lwd = 2, col = 2)
> #解除数据绑定
> detach(a1)
![](https://i-blog.csdnimg.cn/blog_migrate/1d8dd9919a17555f237ecd5131a4264a.png)
本文参考资料为时间序列分析——基于R/王燕编著. —5版. —北京:中国人民大学出版社,2020.6
(基于R应用的统计学丛书)
ISBN 978-7-300-27898-8