所有作品合集传送门: Tidy Tuesday
2018 年合集传送门: 2018
US Tuition Costs
Average Tuition and Educational Attainment in the United States。
Tidy Tuesday 在 GitHub 上的传送地址:
Thomas Mock (2022). Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday
1. 一些环境设置
# 设置为国内镜像, 方便快速安装模块
options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
2. 设置工作路径
wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-02_US_Tuition_Costs/src-d'
setwd(wkdir)
3. 加载 R 包
library(tidyverse)
library(geofacet)
library(showtext)
# 在 Ubuntu 系统上测试的, 不加这个我画出来的汉字会乱码 ~
showtext_auto()
4. 加载数据
df_input <- readxl::read_excel("../data/us_avg_tuition.xlsx")
# 简要查看数据内容
glimpse(df_input)
## Rows: 50
## Columns: 13
## $ State <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Calialbert", "C…
## $ `2004-05` <dbl> 5682.838, 4328.281, 5138.495, 5772.302, 5285.921, 4703.777, …
## $ `2005-06` <dbl> 5840.550, 4632.623, 5415.516, 6082.379, 5527.881, 5406.967, …
## $ `2006-07` <dbl> 5753.496, 4918.501, 5481.419, 6231.977, 5334.826, 5596.348, …
## $ `2007-08` <dbl> 6008.169, 5069.822, 5681.638, 6414.900, 5672.472, 6227.002, …
## $ `2008-09` <dbl> 6475.092, 5075.482, 6058.464, 6416.503, 5897.888, 6284.137, …
## $ `2009-10` <dbl> 7188.954, 5454.607, 7263.204, 6627.092, 7258.771, 6948.473, …
## $ `2010-11` <dbl> 8071.134, 5759.153, 8839.605, 6900.912, 8193.739, 7748.201, …
## $ `2011-12` <dbl> 8451.902, 5762.421, 9966.716, 7028.991, 9436.426, 8315.632, …
## $ `2012-13` <dbl> 9098.069, 6026.143, 10133.503, 7286.580, 9360.574, 8792.856,…
## $ `2013-14` <dbl> 9358.929, 6012.445, 10296.200, 7408.495, 9274.193, 9292.954,…
## $ `2014-15` <dbl> 9496.084, 6148.808, 10413.844, 7606.410, 9186.824, 9298.599,…
## $ `2015-16` <dbl> 9751.101, 6571.340, 10646.278, 7867.297, 9269.844, 9748.188,…
# 检查数据的列名
colnames(df_input)
## [1] "State" "2004-05" "2005-06" "2006-07" "2007-08" "2008-09" "2009-10"
## [8] "2010-11" "2011-12" "2012-13" "2013-14" "2014-15" "2015-16"
5. 数据预处理
df_tidy <- df_input %>%
# pivot_longer() 从宽数据透视到长数据转换
pivot_longer(cols = where(is.numeric), names_to = "period", values_to = "tuition") %>%
# 建议使用 dplyr::mutate 形式调用函数, 有可能与 plyr 中的函数冲突 (因为我自己就报错了...)
dplyr::mutate(period_short = str_sub(period, 3, 8))
# 简要查看数据内容
glimpse(df_tidy)
## Rows: 600
## Columns: 4
## $ State <chr> "AlaAlbert", "Alabama", "Alabama", "Alabama", "Alabama", "A…
## $ period <chr> "2004-05", "2005-06", "2006-07", "2007-08", "2008-09", "2…
## $ tuition <dbl> 5682.838, 5840.550, 5753.496, 6008.169, 6475.092, 7188.95…
## $ period_short <chr> "04-05", "05-06", "06-07", "07-08", "08-09", "09-10", "10…
6. 用 ggplot2 开始绘图
# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起
gg <- ggplot(df_tidy, aes(period_short, tuition, group = State))
# geom_area() 用来生成面积图
gg <- gg + geom_area(fill = "#FFA500")
# facet_geo() 按地理位置分面的数据可视化
gg <- gg + facet_geo( ~ State, grid = "us_state_grid2", label = "code")
# scale_x_discrete() 对离散的坐标轴更改范围、坐标轴标签等
gg <- gg + scale_x_discrete("", breaks = c("04-05", "15-16"), labels = c("'04","'16"))
# scale_y_continuous() 对连续变量设置坐标轴显示范围
gg <- gg + scale_y_continuous("学费及相关的费用", labels = scales::label_number(prefix = "$"))
gg <- gg + labs(title = "2004-2016年美国各州平均学费",
x = NULL,
y = NULL,
caption = "资料来源: onlinembapage.com · graph by 萤火之森")
# theme_minimal() 去坐标轴边框的最小化主题
gg <- gg + theme_minimal()
# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观
gg <- gg + theme(
# panel.grid.major 主网格线, 这一步表示删除主要网格线
panel.grid.major = element_blank(),
# panel.grid.minor 次网格线, 这一步表示删除次要网格线
panel.grid.minor = element_blank(),
# axis.text 坐标轴刻度文本
axis.text = element_text(color = "black", size = 10),
# axis.title 坐标轴标题
axis.title = element_text(color = "black", size = 10),
# plot.title 主标题
plot.title = element_text(color = "black", size = 20, face = "bold"),
# plot.background 图片背景
plot.background = element_rect(fill = "white"))
7 保存图片到 PDF 和 PNG
gg
filename = '20180402-D-01'
ggsave(filename = paste0(filename, ".pdf"), width = 12.2, height = 7.6, device = cairo_pdf)
ggsave(filename = paste0(filename, ".png"), width = 12.2, height = 7.6, dpi = 100, device = "png")
8. session-info
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: albert
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] showtext_0.9-5 showtextdb_3.0 sysfonts_0.8.8 geofacet_0.2.0
## [5] forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4
## [9] readr_2.1.2 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.3.6
## [13] tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] rnaturalearth_0.1.0 httr_1.4.4 sass_0.4.2
## [4] jsonlite_1.8.0 modelr_0.1.9 bslib_0.4.0
## [7] assertthat_0.2.1 highr_0.9 sp_1.5-0
## [10] googlesheets4_1.0.1 cellranger_1.1.0 yaml_2.3.5
## [13] ggrepel_0.9.1 pillar_1.8.1 backports_1.4.1
## [16] lattice_0.20-45 glue_1.6.2 digest_0.6.29
## [19] rvest_1.0.3 colorspace_2.0-3 htmltools_0.5.3
## [22] pkgconfig_2.0.3 broom_1.0.1 haven_2.5.1
## [25] scales_1.2.1 jpeg_0.1-9 tzdb_0.3.0
## [28] proxy_0.4-27 googledrive_2.0.0 farver_2.1.1
## [31] generics_0.1.3 ellipsis_0.3.2 cachem_1.0.6
## [34] withr_2.5.0 cli_3.3.0 magrittr_2.0.3
## [37] crayon_1.5.1 readxl_1.4.1 evaluate_0.16
## [40] fs_1.5.2 fansi_1.0.3 xml2_1.3.3
## [43] class_7.3-20 textshaping_0.3.6 tools_4.2.1
## [46] imguR_1.0.3 hms_1.1.2 gargle_1.2.1
## [49] lifecycle_1.0.1 munsell_0.5.0 geogrid_0.1.1
## [52] reprex_2.0.2 compiler_4.2.1 jquerylib_0.1.4
## [55] e1071_1.7-11 systemfonts_1.0.4 rlang_1.0.5
## [58] classInt_0.4-8 units_0.8-0 grid_4.2.1
## [61] rstudioapi_0.14 labeling_0.4.2 rmarkdown_2.16
## [64] gtable_0.3.1 DBI_1.1.3 R6_2.5.1
## [67] gridExtra_2.3 lubridate_1.8.0 knitr_1.40
## [70] fastmap_1.1.0 rgeos_0.5-9 utf8_1.2.2
## [73] ragg_1.2.3 KernSmooth_2.23-20 stringi_1.7.8
## [76] Rcpp_1.0.9 png_0.1-7 vctrs_0.4.1
## [79] sf_1.0-8 dbplyr_2.2.1 tidyselect_1.1.2
## [82] xfun_0.32
测试数据
配套数据下载:us_avg_tuition.xlsx