20180402-B · US Tuition Costs · ggplot2, 条线图 柱状图 · R 语言数据可视化 案例 源码

所有作品合集传送门: Tidy Tuesday

2018 年合集传送门: 2018

US Tuition Costs

Average Tuition and Educational Attainment in the United States


Tidy Tuesday 在 GitHub 上的传送地址:
Thomas Mock (2022). Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday


在这里插入图片描述
在这里插入图片描述



1. 一些环境设置

# 设置为国内镜像, 方便快速安装模块
options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))

2. 设置工作路径

wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-02_US_Tuition_Costs/src-b'
setwd(wkdir)

3. 加载 R 包

library(tidyverse)
library(lubridate)
library(showtext)
# 在 Ubuntu 系统上测试的, 不加这个我画出来的汉字会乱码 ~
showtext_auto()

4. 加载数据

df_input <- readxl::read_excel("../data/us_avg_tuition.xlsx")

# 简要查看数据内容
glimpse(df_input)
## Rows: 50
## Columns: 13
## $ State     <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", "C…albert
## $ `2004-05` <dbl> 5682.838, 4328.281, 5138.495, 5772.302, 5285.921, 4703.777, …
## $ `2005-06` <dbl> 5840.550, 4632.623, 5415.516, 6082.379, 5527.881, 5406.967, …
## $ `2006-07` <dbl> 5753.496, 4918.501, 5481.419, 6231.977, 5334.826, 5596.348, …
## $ `2007-08` <dbl> 6008.169, 5069.822, 5681.638, 6414.900, 5672.472, 6227.002, …
## $ `2008-09` <dbl> 6475.092, 5075.482, 6058.464, 6416.503, 5897.888, 6284.137, …
## $ `2009-10` <dbl> 7188.954, 5454.607, 7263.204, 6627.092, 7258.771, 6948.473, …
## $ `2010-11` <dbl> 8071.134, 5759.153, 8839.605, 6900.912, 8193.739, 7748.201, …
## $ `2011-12` <dbl> 8451.902, 5762.421, 9966.716, 7028.991, 9436.426, 8315.632, …
## $ `2012-13` <dbl> 9098.069, 6026.143, 10133.503, 7286.580, 9360.574, 8792.856,…
## $ `2013-14` <dbl> 9358.929, 6012.445, 10296.200, 7408.495, 9274.193, 9292.954,…
## $ `2014-15` <dbl> 9496.084, 6148.808, 10413.844, 7606.410, 9186.824, 9298.599,…
## $ `2015-16` <dbl> 9751.101, 6571.340, 10646.278, 7867.297, 9269.844, 9748.188,…
# 检查数据的列名
colnames(df_input)
##  [1] "State"   "2004-05" "2005-06" "2006-07" "2007-08" "2008-09" "2009-10"
##  [8] "2010-11" "2011-12" "2012-13" "2013-14" "2014-15" "2015-16"

5. 数据预处理

# 将列名简化, 方便绘图展示
df_tidy <- gather(df_input, key = Year, value = Tuition, -State)

# 调整年份为 YYYY 格式
df_tidy <- 
    df_tidy %>% 
    mutate(Year = str_sub(Year, start = 1, end = 4))

5. 条形图 (一)

5.1 以 state 为分组取平均值

df_plot_1 <-
    df_tidy %>% 
    select(State, Tuition) %>% 
    group_by(State) %>% 
    # 最好使用 dplyr::summarise() 形式调用函数, 不然容易和内置函数冲突
    dplyr::summarise(Mean_State = mean(Tuition)) %>% 
    arrange(desc(Mean_State))

5.2. 用 ggplot2 开始绘图

# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起
gg <- ggplot(df_plot_1)
# geom_bar() 绘制条形图
gg <- gg + geom_bar(aes(x = reorder(State, Mean_State), y = Mean_State), 
                    stat = "identity", 
                    position = "dodge", 
                    fill = "cadetblue3")
# coord_flip() 倒置坐标系
gg <- gg + coord_flip()
# theme_minimal() 去坐标轴边框的最小化主题
gg <- gg + theme_minimal()
gg <- gg + labs(title = "美国各个州的大学学费变化",
                x = NULL,
                y = NULL)
# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观
gg <- gg + theme(
  # panel.grid.major 主网格线, 这一步表示删除主要网格线
  panel.grid.major = element_blank(),
  # panel.grid.minor 次网格线, 这一步表示删除次要网格线
  panel.grid.minor = element_blank(),
  # axis.text 坐标轴刻度文本
  axis.text = element_text(color = "black", size = 12),
  # axis.title 坐标轴标题
  axis.title = element_text(color = "black", size = 10),
  # plot.title 主标题
  plot.title = element_text(color = "black", size = 20, face = "bold"),
  # plot.subtitle 次要标题
  plot.subtitle = element_text(color = "red", size = 12),
  # plot.background 图片背景
  plot.background = element_rect(fill = "white"))

5.3 保存图片到 PDF 和 PNG

gg

在这里插入图片描述

filename = '20180402-B-01'
ggsave(filename = paste0(filename, ".pdf"), width = 8.6, height = 12, device = cairo_pdf)
ggsave(filename = paste0(filename, ".png"), width = 8.6, height = 12, dpi = 100, device = "png")

6. 条形图 (二)

6.1 以 state 为分组取平均值, 并分类

# 以 state 为分组取平均值, 并以总均值划分为两类
df_plot_2 <-
    df_tidy %>% 
    group_by(State) %>% 
    # 建议使用 dplyr::mutate 形式调用函数, 不然容易与 plyr 中的函数冲突 (因为我自己就报错了...)
    dplyr::mutate(Mean_State = mean(Tuition)) %>% 
    ungroup() %>% 
    dplyr::mutate(Classify = Mean_State - mean(Mean_State)) %>% 
    dplyr::mutate(Category = ifelse(Classify < 0, "低于平均值", "高于平均值")) %>% 
    arrange(desc(Classify)) %>% 
    select(State, Classify, Category) %>% 
    distinct()

6.2. 用 ggplot2 开始绘图

# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起
gg <- ggplot(df_plot_2)
# geom_bar() 绘制条形图
gg <- gg + geom_bar(aes(x = reorder(State, -Classify), y = Classify, fill = Category),
                    stat = "identity",
                    position = "dodge")
gg <- gg + geom_hline(yintercept = 0, linetype = 2)
# coord_flip() 倒置坐标系
gg <- gg + coord_flip()
gg <- gg + scale_fill_brewer()
# theme_minimal() 去坐标轴边框的最小化主题
gg <- gg + theme_minimal()
gg <- gg + labs(title = "美国各个州的大学学费变化",
                x = NULL,
                y = NULL)
# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观
gg <- gg + theme(
  # panel.grid.major 主网格线, 这一步表示删除主要网格线
  panel.grid.major = element_blank(),
  # panel.grid.minor 次网格线, 这一步表示删除次要网格线
  panel.grid.minor = element_blank(),
  # axis.text 坐标轴刻度文本
  axis.text = element_text(color = "black", size = 12),
  # axis.title 坐标轴标题
  axis.title = element_text(color = "black", size = 10),
  # plot.title 主标题
  plot.title = element_text(color = "black", size = 20, face = "bold"),
  # plot.subtitle 次要标题
  plot.subtitle = element_text(color = "red", size = 12),
  # plot.background 图片背景
  plot.background = element_rect(fill = "white"))

6.3 保存图片到 PDF 和 PNG

gg

在这里插入图片描述

filename = '20180402-B-02'
ggsave(filename = paste0(filename, ".pdf"), width = 8.6, height = 12, device = cairo_pdf)
ggsave(filename = paste0(filename, ".png"), width = 8.6, height = 12, dpi = 100, device = "png")

7. session-info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] showtext_0.9-5  showtextdb_3.0  sysfonts_0.8.8  lubridate_1.8.0
##  [5] forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.4    
##  [9] readr_2.1.2     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.3.6  
## [13] tidyverse_1.3.2
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.2.1    digest_0.6.29       utf8_1.2.2         
##  [4] R6_2.5.1            cellranger_1.1.0    backports_1.4.1    
##  [7] reprex_2.0.2        evaluate_0.16       highr_0.9          
## [10] httr_1.4.4          pillar_1.8.1        rlang_1.0.5        
## [13] googlesheets4_1.0.1 readxl_1.4.1        rstudioapi_0.14    
## [16] jquerylib_0.1.4     rmarkdown_2.16      textshaping_0.3.6  
## [19] labeling_0.4.2      googledrive_2.0.0   munsell_0.5.0      
## [22] broom_1.0.1         compiler_4.2.1      modelr_0.1.9       
## [25] xfun_0.32           systemfonts_1.0.4   pkgconfig_2.0.3    
## [28] htmltools_0.5.3     tidyselect_1.1.2    fansi_1.0.3        
## [31] crayon_1.5.1        tzdb_0.3.0          dbplyr_2.2.1       
## [34] withr_2.5.0         grid_4.2.1          jsonlite_1.8.0     
## [37] gtable_0.3.1        lifecycle_1.0.1     DBI_1.1.3          
## [40] magrittr_2.0.3      scales_1.2.1        cli_3.3.0          
## [43] stringi_1.7.8       cachem_1.0.6        farver_2.1.1       
## [46] fs_1.5.2            xml2_1.3.3          bslib_0.4.0        
## [49] ragg_1.2.3          ellipsis_0.3.2      generics_0.1.3     
## [52] vctrs_0.4.1         RColorBrewer_1.1-3  tools_4.2.1        
## [55] glue_1.6.2          hms_1.1.2           fastmap_1.1.0      
## [58] yaml_2.3.5          colorspace_2.0-3    gargle_1.2.1       
## [61] rvest_1.0.3         knitr_1.40          haven_2.5.1        
## [64] sass_0.4.2

测试数据

配套数据下载:us_avg_tuition.xlsx

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值