生信学习——基于R的可视化习题30个（附详细答案解读）

最新推荐文章于 2023-04-11 22:24:18 发布

Dzfly..

最新推荐文章于 2023-04-11 22:24:18 发布

阅读量2.2k

点赞数 4

分类专栏：生信学习文章标签： r语言数据可视化数据处理生信分析

本文链接：https://blog.csdn.net/narutodzx/article/details/120184595

版权

题目目录

一、基础绘图
二、GGPLOT绘图
- 1. 使用ggplot代码重写上面基础绘图的Q1-5习题
- 2. 使用ggplot代码重写上面基础绘图的Q6-9习题
三、生物信息学绘图

写在前面——这是R语言学习的最后一个习题集，本文主要介绍如何使用R语言进行数据可视化，帮助我们直观的看清数据的含义。绘图函数千变万化，不可能把所有的函数全部记下来。要熟练使用帮助文档，不会的时候多翻翻代码，需要长时间积累才能熟练掌握R绘图。

题目原文：http://www.bio-info-trainee.com/4387.html
参考答案：https://www.jianshu.com/p/fab27c63af94
参考答案：https://www.jianshu.com/p/8fce9d2ad562

一、基础绘图

# 准备数据
rm(list = ls())
options(stringsAsFactors = F)

library(airway)
data("airway")
airway 

RNAseq_expr <- assay(airway)
dim(RNAseq_expr)
colnames(RNAseq_expr) 
RNAseq_expr[1:4,1:4] 

RNAseq_gl <- colData(airway)[,3]
table(RNAseq_gl)

1. 对RNAseq_expr的每一列绘制boxplot图

boxplot(RNAseq_expr)

在这里插入图片描述

2. 对RNAseq_expr的每一列绘制density图

# 去除无用值（列之和小于等于1的数据）
e1 <- RNAseq_expr[apply(RNAseq_expr, 1, function(x) sum(x>0)>1), ] 

dim(RNAseq_expr)
# [1] 64102     8
dim(e1)
# [1] 28877     8
                        
plot(density(RNAseq_expr))
plot(density(e1))

在这里插入图片描述

3. 对RNAseq_expr的每一列绘制条形图

# 没经过处理的图，瞄一眼就行了，没啥意义
barplot(RNAseq_expr)

在这里插入图片描述

4. 对RNAseq_expr的每一列取log2后重新绘制boxplot图，density图和条形图

e2 <- log2(e1+1)

# 取log2之后的数据绘图看着舒服多了
# 针对源数据数值较大且差值也比较大时，可以考虑log一下
boxplot(e2)
plot(density(e2))
barplot(e2)

在这里插入图片描述

5. 对Q4的3个图里面添加 trt 和 untrt 组颜色区分开来

# 箱线图
boxplot(e2, main = 'Boxplot of RNAseq-expr',
        xlab = 'samples',ylab = 'expression',col = RNAseq_gl)

# 密度图
# 生成一个可以修改的当前图形参数列表
opar <- par(no.readonly=T)
par(mfrow = c(3,3))
for (i in c(1:8)) {
   
  plot(density(e2[,i]), col=as.integer(RNAseq_gl)[i], main = paste("Density", i))
}
# 将参数重置为修改之前的值
par(opar)

# 如果不小心直接修改了par()，重启RStudio即可恢复默认值

# 直方图
barplot(e2, main = 'Barplot of RNAseq-expr',
        xlab = 'samples',ylab = 'expression', border = NA, col = RNAseq_gl)

# 此时图非常诡异，取个小子集看看什么情况
e3 <- e2[1:10,]
barplot(e3, main = 'Barplot of RNAseq-expr',
        xlab = 'samples',ylab = 'expression', border = NA, col = RNAseq_gl)

可以看到，我们想要的结果是每列的颜色根据分组依次变换但是，图的颜色是在每列中依次变换的
原因是barplot中，数据如果是矩阵，且beside为FALSE（默认），那么图中的每列是由这列数据逐个堆叠而成的，所以颜色也是逐个赋予的

解释有点乱，附上原文
barplot(height, …)
height： either a vector or matrix of values describing the bars which make up the plot. If height is a vector, the plot consists of a sequence of rectangular bars with heights given by the values in the vector. If height is a matrix and beside is FALSE then each bar of the plot corresponds to a column of height, with the values in the column giving the heights of stacked sub-bars making up the bar. If height is a matrix and beside is TRUE, then the values in each column are juxtaposed rather than stacked.

在这里插入图片描述