所有作品合集传送门: Tidy Tuesday
2018 年合集传送门: 2018
NFL Positional Salaries
NFL Positional Salaries
Tidy Tuesday 在 GitHub 上的传送地址:
Thomas Mock (2022). Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday
1. 一些环境设置
# 设置为国内镜像, 方便快速安装模块
options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
2. 设置工作路径
wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-09_NFL_Positional_Salaries/src-e'
setwd(wkdir)
3. 加载 R 包
library(tidyverse)
library(pheatmap)
library(reshape2)
library(RColorBrewer)
library(showtext)
# 在 Ubuntu 系统上测试的, 不加这个我画出来的汉字会乱码 ~
showtext_auto()
4. 加载数据
df_input <- readxl::read_excel("../data/nfl_salary.xlsx")
# 简要查看数据内容
glimpse(df_input)
## Rows: 800
## Columns: 11
## $ year <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20…
## $ Cornerback <dbl> 11265916, 11000000, 10000000, 10000000, 10000000, …
## $ `Defensive Lalbert` <dbl> 17818000, 16200000, 12476000, 11904706, 11762782, …
## $ Linebacker <dbl> 16420000, 15623000, 11825000, 10083333, 10020000, …
## $ `Offensive Lineman` <dbl> 15960000, 12800000, 11767500, 10358200, 10000000, …
## $ Quarterback <dbl> 17228125, 16000000, 14400000, 14100000, 13510000, …
## $ `Running Back` <dbl> 12955000, 10873833, 9479000, 7700000, 7500000, 703…
## $ Safety <dbl> 8871428, 8787500, 8282500, 8000000, 7804333, 76527…
## $ `Special Teamer` <dbl> 4300000, 3725000, 3556176, 3500000, 3250000, 32250…
## $ `Tight End` <dbl> 8734375, 8591000, 8290000, 7723333, 6974666, 61333…
## $ `Wide Receiver` <dbl> 16250000, 14175000, 11424000, 11415000, 10800000, …
# 检查数据的列名
colnames(df_input)
## [1] "year" "Cornerback" "Defensive Lineman"
## [4] "Linebacker" "Offensive Lineman" "Quarterback"
## [7] "Running Back" "Safety" "Special Teamer"
## [10] "Tight End" "Wide Receiver"
5. 数据预处理
# 备份一份数据
df_tidy <- df_input
# 将缺失值替换为 0
df_tidy[is.na(df_tidy)] <- 0
# 将年份列更改为字符类型
df_tidy$year <- as.character(df_tidy$year)
# 根据职位进行分组, 获得每年的平均值
df_tidy <- df_tidy %>% group_by(year) %>% summarise_if(is.numeric, mean) %>% as.data.frame
# 自定义一个函数, 用于创建矩阵但保留行名
mat.convert <- function(i) {
# tibble 转换为 data.frame 格式, 否则会报错
i <- as.data.frame(i)
mat <- as.matrix(i[, -1])
rownames(mat) <- i[, 1]
mat
}
df_plot <- mat.convert(df_tidy)
# 简要查看数据内容
glimpse(df_plot)
## num [1:8, 1:10] 3037766 3132916 2901798 3038278 3758543 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:8] "2011" "2012" "2013" "2014" ...
## ..$ : chr [1:10] "Cornerback" "Defensive Lineman" "Linealbert" "Offensive Lineman" ...
6. 用 pheatmap 绘制热图
# 覆盖热图中的列标签
draw.colnames.45 <- function (coln, gaps, ...) {
coord = pheatmap:::find_coordinates(length(coln), gaps)
x = coord$coord - 0.5 * coord$size
res = grid::textGrob(coln, x = x, y = unit(1, "npc") - unit(3, "bigpts"), vjust = 0.5, hjust = 1, rot = 45, gp = grid::gpar(...))
return(res)}
assignInNamespace(x = "draw_colnames", value = "draw.colnames.45",
ns <- asNamespace("pheatmap"))
# 自定义调色板
palette <- colorRampPalette(brewer.pal(9, "Set1"))(100)
# 使用 pheatmap() 函数绘制热图
figure = pheatmap(df_plot,
color = palette,
border_color = NA,
cluster_cols = TRUE,
cluster_rows = FALSE,
cellwidth = 15,
cellheight = 15)
7 保存图片到 PDF 和 PNG
filename = '20180409-E-01'
ggsave(figure, filename = paste0(filename, ".pdf"), width = 5, height = 5, device = cairo_pdf)
ggsave(figure, filename = paste0(filename, ".png"), width = 5, height = 5, dpi = 120, device = "png")
8. session-info
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: albert
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] showtext_0.9-5 showtextdb_3.0 sysfonts_0.8.8 RColorBrewer_1.1-3
## [5] reshape2_1.4.4 pheatmap_1.0.12 forcats_0.5.2 stringr_1.4.1
## [9] dplyr_1.0.10 purrr_0.3.4 readr_2.1.2 tidyr_1.2.1
## [13] tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 lubridate_1.8.0 assertthat_0.2.1
## [4] digest_0.6.29 utf8_1.2.2 plyr_1.8.7
## [7] R6_2.5.1 cellranger_1.1.0 backports_1.4.1
## [10] reprex_2.0.2 evaluate_0.16 highr_0.9
## [13] httr_1.4.4 pillar_1.8.1 rlang_1.0.5
## [16] googlesheets4_1.0.1 readxl_1.4.1 rstudioapi_0.14
## [19] jquerylib_0.1.4 rmarkdown_2.16 textshaping_0.3.6
## [22] googledrive_2.0.0 munsell_0.5.0 broom_1.0.1
## [25] compiler_4.2.1 modelr_0.1.9 xfun_0.32
## [28] systemfonts_1.0.4 pkgconfig_2.0.3 htmltools_0.5.3
## [31] tidyselect_1.1.2 fansi_1.0.3 crayon_1.5.1
## [34] tzdb_0.3.0 dbplyr_2.2.1 withr_2.5.0
## [37] grid_4.2.1 jsonlite_1.8.0 gtable_0.3.1
## [40] lifecycle_1.0.1 DBI_1.1.3 magrittr_2.0.3
## [43] scales_1.2.1 cli_3.3.0 stringi_1.7.8
## [46] cachem_1.0.6 fs_1.5.2 xml2_1.3.3
## [49] bslib_0.4.0 ragg_1.2.3 ellipsis_0.3.2
## [52] generics_0.1.3 vctrs_0.4.1 tools_4.2.1
## [55] glue_1.6.2 hms_1.1.2 fastmap_1.1.0
## [58] yaml_2.3.5 colorspace_2.0-3 gargle_1.2.1
## [61] rvest_1.0.3 knitr_1.40 haven_2.5.1
## [64] sass_0.4.2
测试数据
配套数据下载:nfl_salary.xlsx