20180416-I · Global Mortality · DT 数据表可视化工具 · datatable 表格 · R 语言数据可视化案例源码

数绘小站

已于 2022-10-16 13:15:53 修改

阅读量417

点赞数 1

分类专栏： # Tidy Tuesday (2018) 文章标签： r语言数据挖掘

于 2022-10-15 00:00:00 首次发布

本文链接：https://blog.csdn.net/Albert_XN/article/details/127276167

版权

Tidy Tuesday (2018) 专栏收录该内容

43 篇文章 20 订阅

订阅专栏

所有作品合集传送门： Tidy Tuesday

2018 年合集传送门： 2018

Global Mortality

What do people die from?

在过去的几个世纪里，世界发生了很大的变化–这就是《我们的世界》的数据所显示的。然而，有一件事在这种转变中一直保持不变：我们都必须在某个时候死亡。然而，随着生活水平的提高、医疗保健的进步和生活方式的改变，死亡的原因正在发生变化。

在这篇博客中，我们试图回答 “人们死于什么？”，首先看一下全球死因的数据，然后选择国家层面的例子。
世界各地的主要死因仍有很大差异，因此，也可以选择了一些国家，以突出这种异质性。

本次示例通过一些可视化方式来展示这些信息。

在这里插入图片描述

在使用 Rmarkdown 或 Shiny 写报告时总会遇到 table 展示的困扰，对此益辉大神发布DT包，完美的解决了这个痛点。这个 DT 包到底是干什么用的呢？DT 包提供了 JavaScript 库 DataTables 在 R 中的一个接口，使得 R 对象（矩阵或者数据框）可以在 HTML 页面上显示为表格，并提供了对数据的筛选、分页和排序等功能。
在没有任何多余设置的情况下，仅仅使用 DT 包中的datatable()函数，就可以直接可视化表格数据，并且自动完成对数据的分页显示，使用户可以对每一列进行升（降）序排列，控制每页显示条目的多少，以及对数据进行检索等一系列功能。
具体使用可以学习帮助文档或检索其他大佬们的学习笔记，这里就不多加累述了，不是本文的重点。

言归正传，开始正式制表。

1. 一些环境设置

# 设置为国内镜像, 方便快速安装模块
options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))

2. 设置工作路径

wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-16_Global_Mortality/src-i'
setwd(wkdir)

3. 加载 R 包

library(DT)
library(raster)
library(tidyverse)

4. 加载数据

# 读取数据
df_input <- readxl::read_excel("../data/global_mortality.xlsx")

# 简要查看数据内容
glimpse(df_input)

## Rows: 6,156
## Columns: 35
## $ country                                    <chr> "Afghanistan", "Afghanistan…
## $ country_code                               <chr> "AFG", "AFG", "AFG", "AFG",…
## $ year                                       <dbl> 1990, 1991, 1992, 1993, 199…
## $ `Cardiovascular diseases (%)`              <dbl> 17.61040, 17.80181, 18.3868…
## $ `Cancers (%)`                              <dbl> 4.025975, 4.054145, 4.17395…
## $ `Respiratory diseases (%)`                 <dbl> 2.106626, 2.134176, 2.20829…
## $ `Diabetes (%)`                             <dbl> 3.832555, 3.822228, 3.90012…
## $ `Dementia (%)`                             <dbl> 0.5314287, 0.5324973, 0.540…
## $ `Lower respiratory infealbert (%)`         <dbl> 10.886362, 10.356968, 10.09…
## $ `Neonatal deaths (%)`                      <dbl> 9.184653, 8.938897, 8.84138…
## $ `Diarrheal diseases (%)`                   <dbl> 2.497141, 2.572228, 2.70774…
## $ `Road accidents (%)`                       <dbl> 3.715944, 3.729142, 3.81635…
## $ `Liver disease (%)`                        <dbl> 0.8369093, 0.8455159, 0.874…
## $ `Tuberculosis (%)`                         <dbl> 5.877075, 5.891704, 6.03466…
## $ `Kidney disease (%)`                       <dbl> 1.680611, 1.671115, 1.70098…
## $ `Digestive diseases (%)`                   <dbl> 1.058771, 1.049322, 1.06288…
## $ `HIV/AIDS (%)`                             <dbl> 0.01301948, 0.01451458, 0.0…
## $ `Suicide (%)`                              <dbl> 0.4366105, 0.4422802, 0.456…
## $ `Malaria (%)`                              <dbl> 0.4488863, 0.4550191, 0.460…
## $ `Homicide (%)`                             <dbl> 1.287020, 1.290991, 1.32616…
## $ `Nutritional deficiencies (%)`             <dbl> 0.3505045, 0.3432123, 0.345…
## $ `Meningitis (%)`                           <dbl> 3.037603, 2.903202, 2.84064…
## $ `Protein-energy malnutrition (%)`          <dbl> 0.3297599, 0.3221711, 0.323…
## $ `Drowning (%)`                             <dbl> 0.9838624, 0.9545860, 0.951…
## $ `Maternal deaths (%)`                      <dbl> 1.769213, 1.749264, 1.76424…
## $ `Parkinson disease (%)`                    <dbl> 0.02515859, 0.02545063, 0.0…
## $ `Alcohol disorders (%)`                    <dbl> 0.02899828, 0.02917152, 0.0…
## $ `Intestinal infectious diseases (%)`       <dbl> 0.1833303, 0.1781074, 0.176…
## $ `Drug disorders (%)`                       <dbl> 0.04120540, 0.04203340, 0.0…
## $ `Hepatitis (%)`                            <dbl> 0.1387378, 0.1350081, 0.134…
## $ `Fire (%)`                                 <dbl> 0.1741567, 0.1706712, 0.171…
## $ `Heat-related (hot and cold exposure) (%)` <dbl> 0.1378229, 0.1348266, 0.139…
## $ `Natural disasters (%)`                    <dbl> 0.00000000, 0.79760256, 0.3…
## $ `Conflict (%)`                             <dbl> 0.932, 2.044, 2.408, NA, 4.…
## $ `Terrorism (%)`                            <dbl> 0.007, 0.040, 0.027, NA, 0.…

# 检查数据的列名
colnames(df_input)

##  [1] "country"                                 
##  [2] "country_code"                            
##  [3] "year"                                    
##  [4] "Cardiovascular diseases (%)"             
##  [5] "Cancers (%)"                             
##  [6] "Respiratory diseases (%)"                
##  [7] "Diabetes (%)"                            
##  [8] "Dementia (%)"                            
##  [9] "Lower respiratory infections (%)"        
## [10] "Neonatal deaths (%)"                     
## [11] "Diarrheal diseases (%)"                  
## [12] "Road accidents (%)"                      
## [13] "Liver disease (%)"                       
## [14] "Tuberculosis (%)"                        
## [15] "Kidney disease (%)"                      
## [16] "Digestive diseases (%)"                  
## [17] "HIV/AIDS (%)"                            
## [18] "Suicide (%)"                             
## [19] "Malaria (%)"                             
## [20] "Homicide (%)"                            
## [21] "Nutritional deficiencies (%)"            
## [22] "Meningitis (%)"                          
## [23] "Protein-energy malnutrition (%)"         
## [24] "Drowning (%)"                            
## [25] "Maternal deaths (%)"                     
## [26] "Parkinson disease (%)"                   
## [27] "Alcohol disorders (%)"                   
## [28] "Intestinal infectious diseases (%)"      
## [29] "Drug disorders (%)"                      
## [30] "Hepatitis (%)"                           
## [31] "Fire (%)"                                
## [32] "Heat-related (hot and cold exposure) (%)"
## [33] "Natural disasters (%)"                   
## [34] "Conflict (%)"                            
## [35] "Terrorism (%)"

5. 数据预处理

df_tidy <- df_input %>%
  mutate_if(is.numeric, funs(replace(., is.na(.), 0))) %>%
  # group_by() 以指定的列进行分组
  group_by(country) %>%
  # select() 选择需要使用的列
  select(-year) %>%
  # summarise_if() 同时总结多个列
  summarise_if(is.numeric, funs(round(lm(. ~ c(1:27))$coeff[[2]], 3))) %>%
  # gather() 数据收缩、从宽表到窄表
  gather(Stat, Value, -country) %>%
  # 建议使用 dplyr::mutate 形式调用函数, 有可能与 plyr 中的函数冲突
  mutate(Continent = ccodes()[match(country, ccodes()$NAME), 9],
         Stat = gsub(".[[:punct:]]", "", Stat)) %>%
  # select() 选择需要使用的列
  select(1, 2, 4, 3) %>%
  # arrange() 根据 change 列进行排序, 默认是升序
  arrange(desc(Value)) %>%
  {.}

# 选择中国的数据
df_china <- df_tidy %>% dplyr::filter(country == 'China')

# 删除缺失值
df_plot <- na.omit(df_china)

# 简要查看数据内容
glimpse(df_plot)

## Rows: 32
## Columns: 4
## $ country   <chr> "China", "China", "China", "China", "China", "China", "China…
## $ Stat      <chr> "Cardiovascular diseases", "Cancers", "Dementia", "Diabetes"…
## $ Continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asi…
## $ Value     <dbl> 0.515, 0.280, 0.118, 0.041, 0.021, 0.019, 0.013, 0.013, 0.00…

6. 用 DT 生成表格

# 自定义一个用于显示占比情况的条形图函数, 负数从中间向左延伸, 正数从中间向右延伸
color_from_middle <- function (data, color1, color2) {
  max_val = max(abs(data))
  JS(sprintf("isNaN(parseFloat(value)) || value < 0 ? 'linear-gradient(90deg, transparent, transparent ' + (50 + value/%s * 50) + '%%, %s ' + (50 + value/%s * 50) + '%%,%s  50%%,transparent 50%%)': 'linear-gradient(90deg, transparent, transparent 50%%, %s 50%%, %s ' + (50 + value/%s * 50) + '%%, transparent ' + (50 + value/%s * 50) + '%%)'",
             max_val, color2, max_val, color2, color1, color1, max_val, max_val))
}

# 使用 datatable() 创建一个HTML表格
datatable(df_plot,
          # colnames 用于完全替换列名, 或部分替换列名
          colnames = c("Country", "Mortality Statistic", "Continent", "Overall Change"), 
          # caption 参数用于添加表格标题, 它可以是一个字符串向量, 也可以是一个由 htmltools::tag$caption() 创建的标签对象
          caption = "中国地区不同致死因素致死率变化情况",
          # extensions 指定表格拓展内容, Scroller 表示右边的滚动条; Buttons 表示按钮框
          # 具体还有哪些组件可以查看这个网址: https://datatables.net/extensions/index
          extensions = c('Scroller', 'Buttons', 'AutoFill'),
          # 默认不能对表格列进行过滤, 添加 filter 参数实现列的过滤. top 表示置于顶部, bottom 表示置于底部
          filter = 'bottom',
          # 具体插件信息可以参考这个网址: https://rstudio.github.io/DT/plugins.html
          plugins = 'natural',
          # options 是一个非常强大的参数, 可以将这些 options 写在 list 中用于定制自己想要的表格
          options = list(scrollY = 300, 
                         scroller = TRUE, 
                         # 指定各个控件的顺序
                         # l --> length changing input control                  表格显示多少个控件
                         # f --> filtering input                                筛选框 search
                         # t --> The table                                      表格本身
                         # i --> Tabke information summary                      表格统计信息
                         # p --> pagination control                             控制第几页
                         # r --> processing display element                     显示进程元素
                         dom = 'lBfrtip',
                         # 指定各个控件
                         # 注意 extensions 要指定一个 Buttons 才能生效
                         buttons = c('colvis', 'copy', 'csv', 'excel', 'pdf', 'print'),
                         # 一页展示多少个观测
                         pageLength = 20,
                         # 通过 columnsDefs 参数来设定具体的列格式, 如将1-3列按照自然方式排序, 并居中
                         columnDefs = list(list(className = 'dt-center', type = 'natural', targets = 1:3)),
                         # 在鼠标移到某个单元格上时, 单元格的右下角会有个蓝色小方框
                         # 注意 extensions 要指定一个 AutoFill 才能生效
                         autoFill = list(columns = c(1, 2, 3), focus = 'click')),
          # 如果数据对象存在行名, 默认情况下会显示在表格的第一列, 通过设置 rownames = FALSE 可以取消显示行名
          # 也可以用字符串向量来更改行名
          rownames = FALSE, 
          # 配置表自动填充它所包含的元素
          fillContainer = TRUE,
          # class 默认是 display, 其他用法可以看 https://datatables.net/manual/styling/classes
          # 这里用自定义函数 color_from_middle 添加了变化情况的条形图
          class = 'cell-border stripe') %>% formatStyle("Value",
              background = color_from_middle(df_plot$Value, 'red', 'lightblue'),
              backgroundSize = '95% 50%',
              backgroundRepeat = 'no-repeat',
              backgroundPosition = 'left')

在这里插入图片描述

7. session-info

sessionInfo()

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: albert
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.4    
##  [5] readr_2.1.2     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.3.6  
##  [9] tidyverse_1.3.2 raster_3.6-3    sp_1.5-0        DT_0.25        
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9          lubridate_1.8.0     lattice_0.20-45    
##  [4] assertthat_0.2.1    digest_0.6.29       utf8_1.2.2         
##  [7] R6_2.5.1            cellranger_1.1.0    backports_1.4.1    
## [10] reprex_2.0.2        evaluate_0.16       httr_1.4.4         
## [13] pillar_1.8.1        rlang_1.0.5         readxl_1.4.1       
## [16] googlesheets4_1.0.1 rstudioapi_0.14     jquerylib_0.1.4    
## [19] rmarkdown_2.16      googledrive_2.0.0   htmlwidgets_1.5.4  
## [22] munsell_0.5.0       broom_1.0.1         compiler_4.2.1     
## [25] modelr_0.1.9        xfun_0.32           pkgconfig_2.0.3    
## [28] htmltools_0.5.3     tidyselect_1.1.2    codetools_0.2-18   
## [31] fansi_1.0.3         crayon_1.5.1        withr_2.5.0        
## [34] tzdb_0.3.0          dbplyr_2.2.1        grid_4.2.1         
## [37] jsonlite_1.8.0      gtable_0.3.1        lifecycle_1.0.1    
## [40] DBI_1.1.3           magrittr_2.0.3      scales_1.2.1       
## [43] cli_3.3.0           stringi_1.7.8       cachem_1.0.6       
## [46] fs_1.5.2            xml2_1.3.3          bslib_0.4.0        
## [49] ellipsis_0.3.2      generics_0.1.3      vctrs_0.4.1        
## [52] tools_4.2.1         glue_1.6.2          crosstalk_1.2.0    
## [55] hms_1.1.2           fastmap_1.1.0       yaml_2.3.5         
## [58] colorspace_2.0-3    gargle_1.2.1        terra_1.6-17       
## [61] rvest_1.0.3         knitr_1.40          haven_2.5.1        
## [64] sass_0.4.2

测试数据

配套数据下载：global_mortality.xlsx

数绘小站

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
20180416-I · Global Mortality · DT 数据表可视化工具 · datatable 表格 · R 语言数据可视化案例源码

Tidy Tuesday 2018-04-16 的周数据可视化示例 TidyTuesday：Global Mortality，主要利用了 DT 包对数据进行可视化展示。
复制链接

扫一扫