昨天我用CIBERSORT计算了22种免疫细胞的丰度,接下来需要就要可视化展示。前面的数据处理我用一致性聚类已经分为两组,用分组小提琴图可视化并添加显著性标识。最终图形如下所示:
先准备cibersort计算出来的数据
rm(list = ls())
#加载CIBERSORT计算好的免疫细胞丰度数据和分组信息
load("cibersort.Rdata")
load("cluster.Rdata")
#共两个变量
immu_cell[1:4,1:4]
# B cells naive B cells memory Plasma cells T cells CD8
#GSM274895 0.01745695 0.12191198 0 0.07672544
#GSM274896 0.03575428 0.04171772 0 0.09460395
#GSM274897 0.00000000 0.16174924 0 0.07250404
#GSM274898 0.22305092 0.32276996 0 0.03555180
head(group,3)
# sample group
#GSM274895 GSM274895 cluster1
#GSM274896 GSM274896 cluster1
#GSM274897 GSM274897 cluster
library(dplyr)
library(tidyr)
library(tibble)
#转化为数据框并将行名转为为一列
immu_cell <- immu_cell %>% as.data.frame() %>% rownames_to_column("sample")
immu_cell[1:4,1:4]
# sample B cells naive B cells memory Plasma cells
#1 GSM274895 0.01745695 0.12191198 0
#2 GSM274896 0.03575428 0.04171772 0
#3 GSM274897 0.00000000 0.16174924 0
#4 GSM274898 0.22305092 0.32276996 0
#根据sampleID对两个数据框全连接,添加分组信息
data <- full_join(immu_cell,group,by="sample")
#对数据塑形,将中间22列免疫细胞数据变成长数据
data <- gather(data, Cell_type, Proportion, "B cells naive":"Neutrophils")
#将细胞类型转换为因子,保持画图的时候细胞顺序不变
data$Cell_type=factor(data$Cell_type,levels = colnames(immu_cell))
head(data,3)
# sample group Cell_type Proportion
#1 GSM274895 cluster1 B cells naive 0.01745695
#2 GSM274896 cluster1 B cells naive 0.03575428
#3 GSM274897 cluster1 B cells naive 0.00000000
#这一部分筛选出每个细胞类型中最大的值,为添加P值定位而准备的
location <- data %>% group_by(Cell_type) %>% slice_max(Proportion)
location$x <- seq(1,22,by=1)
head(location,3)
# A tibble: 3 × 5
# Groups: Cell_type [3]
# sample group Cell_type Proportion x
# <chr> <fct> <chr> <dbl> <dbl>
#1 GSM274947 cluster1 B cells memory 0.593 1
#2 GSM274902 cluster1 B cells naive 0.676 2
#3 GSM274972 cluster1 Dendritic cells activated 0.109 3
数据准备好了,一个data,含有分组信息的长数据,一个location等会儿画显著性需要。
小提琴图的代码只有一条命令,但是我调试很很久,希望画得稍微好看一点吧。
ggplot(data,aes(Cell_type,Proportion,fill=group))+
geom_violin(scale = "width",alpha=0.8,width=0.5,size=0.8)+ #画小提琴图
scale_fill_manual(values = c("#F7903D","4D85BD"))+ #分组添加颜色
stat_compare_means(aes(group=group), #按分组进行统计检验
method = "t.test",
paired = F, #非配对t检验
symnum.args = list(cutpoint=c(0,0.001,0.01,0.05,1),
symbols=c("***","**","*","ns")),
label = "p.signif",
label.y = location$Proportion+0.02, #添加显著性符号的位置
size=4.5)+ #显著性符号的大小
geom_segment(data=location, #在显著性符号下面添加一条短线
aes(x=x,y=Proportion,
xend=x+0.2,yend=Proportion),
size=1)+
xlab("")+ #X标签
ylab("Fraction")+ #y标签
theme_bw()+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.border=element_rect(size=1.2), #给边框加粗
axis.text.x = element_text(angle=60,size=10,vjust = 1,hjust =1,color = "black"),
axis.text.y = element_text(size =10),
legend.position = c(0.9,0.85) )
最终得到的分组小提琴图就画好了。
(我的统计学知识不太好,这里是否应该使用非配对t检验不确定,如有错误请指出。)