可视化系列汇总——群体关系

最新推荐文章于 2024-08-05 11:32:07 发布

庄闪闪

最新推荐文章于 2024-08-05 11:32:07 发布

阅读量412

点赞数

分类专栏： R可视化文章标签： r语言 ggplot2

本文链接：https://blog.csdn.net/qq_37379316/article/details/127192699

版权

R可视化专栏收录该内容

45 篇文章

订阅专栏

引言

在进行数据分析时，免不了对结果进行可视化。那么，什么样的图形才最适合自己的数据呢？一个有效的图形应具备以下特点：

能正确传递信息，而不会产生歧义；
样式简单，但是易于理解；
添加的图形美学应辅助理解信息；
图形上不应出现冗余无用的信息。

本系列推文，小编将汇总可视化中常用 7 大类型图形，供读者参考。每类制作成一篇推文，主要参考资料为：Top 50 ggplot2 Visualizations。其他类似功能网站，资料包括：

系列目录

本文主要介绍第七部分：群体关系图形。

加载数据集

使用 ggplot2 包中自带数据集作为示例数据集。

library(ggplot2)
library(plotrix)
data("midwest", package = "ggplot2") #加载数据集

midwest 数据集

全局主题设置

全局配色、主题设置。注意，本文使用离散色阶，如果需要使用连续色阶，则需要重写。

options(scipen=999)  # 关掉像 1e+48 这样的科学符号
# 颜色设置（灰色系列）
cbp1 <- c("#999999", "#E69F00", "#56B4E9", "#009E73",
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# 颜色设置（黑色系列）
cbp2 <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")


ggplot <- function(...) ggplot2::ggplot(...) + 
  scale_color_manual(values = cbp1) +
  scale_fill_manual(values = cbp1) + # 注意: 使用连续色阶时需要重写
  theme_bw()

7 群体关系

7.1 谱系图

library(ggplot2)
library(ggdendro)
theme_set(theme_bw())

hc <- hclust(dist(USArrests), "ave")  # hierarchical clustering
# plot
ggdendrogram(hc, rotate = TRUE, size = 2)

谱系图

7.2 聚类图

可以使用 geom_surround() 来显示不同的簇或组。如果数据集有多个特征，还可以计算主成分，并使用 PC1 和 PC2 作为 X 和 Y 轴绘制散点图。geom_encircle() 可用于框选所需的组。

# devtools::install_github("hrbrmstr/ggalt")
library(ggplot2)
library(ggalt)
library(ggfortify)
theme_set(theme_classic())

# Compute data with principal components ------------------
df <- iris[c(1, 2, 3, 4)]
pca_mod <- prcomp(df)  # compute principal components

# Data frame of principal components ----------------------
df_pc <- data.frame(pca_mod$x, Species=iris$Species)  # dataframe of principal components
df_pc_vir <- df_pc[df_pc$Species == "virginica", ]  # df for 'virginica'
df_pc_set <- df_pc[df_pc$Species == "setosa", ]  # df for 'setosa'
df_pc_ver <- df_pc[df_pc$Species == "versicolor", ]  # df for 'versicolor'
 
# Plot ----------------------------------------------------
ggplot(df_pc, aes(PC1, PC2, col=Species)) + 
  geom_point(aes(shape=Species), size=2) +   # draw points
  labs(title="Iris Clustering", 
       subtitle="With principal components PC1 and PC2 as X and Y axis",
       caption="Source: Iris") + 
  coord_cartesian(xlim = 1.2 * c(min(df_pc$PC1), max(df_pc$PC1)), 
                  ylim = 1.2 * c(min(df_pc$PC2), max(df_pc$PC2))) +   # change axis limits
  geom_encircle(data = df_pc_vir, aes(x=PC1, y=PC2)) +   # draw circles
  geom_encircle(data = df_pc_set, aes(x=PC1, y=PC2)) + 
  geom_encircle(data = df_pc_ver, aes(x=PC1, y=PC2))

聚类图