R语言绘图 | 绘制热图

每次想要绘制热图都需要重新翻找网络的资源,也怪我自己比较菜了。于是,从重视经验的储存的角度来看,是时候要好好的整理一下了。

1。首先准备数据。

这个数据也很简单,就是一个方方正正的data.frame。如下图所示。其实,热图说白了,就是把数值的大小给通过颜色的深浅,更加直观的“表达”出来。因此,这也是图像的价值。
在这里插入图片描述

2。加载pheatmap包。

使用pheatmap画热图是我这么多年来,一直用的。之前也用过ggplot2,好像太麻烦,就没有继续用了。

library(pheatmap)
pheatmap(data)

得到了一个简单的,初步的聚类的结果。
在这里插入图片描述
接下来,就是将如何更进一步的优化这个图。
我比较常用的处理有以下这些。

3。热图的优化。

(1)添加间隔。

pheatmap(data,cutree_cols=6) #即,将列分成6份,更加直观的显示类。

在这里插入图片描述
(2)添加原始的类的标签。
有这个需求主要是,一些样本可能我们原来就有类的标签。我们想看根据特定的规则进行聚类的过程中,原始相同的标签是否依旧聚在一起,这也是我比较关心的地方。

构建列注释信息。我们原先就已经准备好了,这个时候直接读取即可。

rannotation_col<-read.table("annotation.txt",sep = "\t",header = T)
row.names(annotation_col)<-annotation_col[,1]
annotation_col<-annotation_col[,-1]
head(annotation_col) #构建了这样一共data.frame。要求是这个data.frame的行名应该与我们要注释的列名一致。否则会报错。
> head(annotation_col)
          bcr.abl_status      Tumor_Stage
NBMA3             normal       normal_hsc
NBMA5             normal       normal_hsc
NBMB5             normal       normal_hsc
NBMC5             normal       normal_hsc
OX1931A10       negative pre_blast_crisis
OX1931B1        positive pre_blast_crisis

添加到我们的pheatmap()函数中。

pheatmap(data,treeheight_col=50,cutree_cols=6,clustering_method = "ward.D",annotation_col = annotation_col)

在这里插入图片描述
从这张图中可以看到,特定标签的样本分的很开。而有些当然也混在一起,这也是我们要继续去追究原因的地方。
所以,上图基本达到了我们的需求。很有意思。
当然,pheatmap()这个函数本身就有超级多的参数,这里也一并附上来。以后有相关的需求,可以继续的深入的研究。

?pheatmap()

Arguments
mat	
numeric matrix of the values to be plotted.

color	
vector of colors used in heatmap.

kmeans_k	
the number of kmeans clusters to make, if we want to aggregate the rows before drawing heatmap. If NA then the rows are not aggregated.

breaks	
a sequence of numbers that covers the range of values in mat and is one element longer than color vector. Used for mapping values to colors. Useful, if needed to map certain values to certain colors, to certain values. If value is NA then the breaks are calculated automatically. When breaks do not cover the range of values, then any value larger than max(breaks) will have the largest color and any value lower than min(breaks) will get the lowest color.

border_color	
color of cell borders on heatmap, use NA if no border should be drawn.

cellwidth	
individual cell width in points. If left as NA, then the values depend on the size of plotting window.

cellheight	
individual cell height in points. If left as NA, then the values depend on the size of plotting window.

scale	
character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. Corresponding values are "row", "column" and "none"

cluster_rows	
boolean values determining if rows should be clustered or hclust object,

cluster_cols	
boolean values determining if columns should be clustered or hclust object.

clustering_distance_rows	
distance measure used in clustering rows. Possible values are "correlation" for Pearson correlation and all the distances supported by dist, such as "euclidean", etc. If the value is none of the above it is assumed that a distance matrix is provided.

clustering_distance_cols	
distance measure used in clustering columns. Possible values the same as for clustering_distance_rows.

clustering_method	
clustering method used. Accepts the same values as hclust.

clustering_callback	
callback function to modify the clustering. Is called with two parameters: original hclust object and the matrix used for clustering. Must return a hclust object.

cutree_rows	
number of clusters the rows are divided into, based on the hierarchical clustering (using cutree), if rows are not clustered, the argument is ignored

cutree_cols	
similar to cutree_rows, but for columns

treeheight_row	
the height of a tree for rows, if these are clustered. Default value 50 points.

treeheight_col	
the height of a tree for columns, if these are clustered. Default value 50 points.

legend	
logical to determine if legend should be drawn or not.

legend_breaks	
vector of breakpoints for the legend.

legend_labels	
vector of labels for the legend_breaks.

annotation_row	
data frame that specifies the annotations shown on left side of the heatmap. Each row defines the features for a specific row. The rows in the data and in the annotation are matched using corresponding row names. Note that color schemes takes into account if variable is continuous or discrete.

annotation_col	
similar to annotation_row, but for columns.

annotation	
deprecated parameter that currently sets the annotation_col if it is missing

annotation_colors	
list for specifying annotation_row and annotation_col track colors manually. It is possible to define the colors for only some of the features. Check examples for details.

annotation_legend	
boolean value showing if the legend for annotation tracks should be drawn.

annotation_names_row	
boolean value showing if the names for row annotation tracks should be drawn.

annotation_names_col	
boolean value showing if the names for column annotation tracks should be drawn.

drop_levels	
logical to determine if unused levels are also shown in the legend

show_rownames	
boolean specifying if column names are be shown.

show_colnames	
boolean specifying if column names are be shown.

main	
the title of the plot

fontsize	
base fontsize for the plot

fontsize_row	
fontsize for rownames (Default: fontsize)

fontsize_col	
fontsize for colnames (Default: fontsize)

angle_col	
angle of the column labels, right now one can choose only from few predefined options (0, 45, 90, 270 and 315)

display_numbers	
logical determining if the numeric values are also printed to the cells. If this is a matrix (with same dimensions as original matrix), the contents of the matrix are shown instead of original values.

number_format	
format strings (C printf style) of the numbers shown in cells. For example "%.2f" shows 2 decimal places and "%.1e" shows exponential notation (see more in sprintf).

number_color	
color of the text

fontsize_number	
fontsize of the numbers displayed in cells

gaps_row	
vector of row indices that show where to put gaps into heatmap. Used only if the rows are not clustered. See cutree_row to see how to introduce gaps to clustered rows.

gaps_col	
similar to gaps_row, but for columns.

labels_row	
custom labels for rows that are used instead of rownames.

labels_col	
similar to labels_row, but for columns.

filename	
file path where to save the picture. Filetype is decided by the extension in the path. Currently following formats are supported: png, pdf, tiff, bmp, jpeg. Even if the plot does not fit into the plotting window, the file size is calculated so that the plot would fit there, unless specified otherwise.

width	
manual option for determining the output file width in inches.

height	
manual option for determining the output file height in inches.

silent	
do not draw the plot (useful when using the gtable output)

na_col	
specify the color of the NA cell in the matrix.

...	
graphical parameters for the text used in plot. Parameters passed to grid.text, see gpar.

本篇文章用到的所有的代码:

setwd("F://8//7") #设置工作路径
data<-read.csv("result_data.csv",header = T,sep = ",") 

row.names(data)<-data[,1]
data<-data[,-1]

library(pheatmap)
pheatmap(data)
pheatmap(data,treeheight_col=50,cutree_cols=6,clustering_method = "ward.D")
# 构建列注释信息
dim(data)[2] #28个样本
colnames(data)
annotation_col<-read.table("annotation.txt",sep = "\t",header = T)
row.names(annotation_col)<-annotation_col[,1]
annotation_col<-annotation_col[,-1]
head(annotation_col)
pheatmap(data,treeheight_col=50,cutree_cols=6,clustering_method = "ward.D",annotation_col = annotation_col)

本篇文章用到的数据链接:
注释文件:https://download.csdn.net/download/weixin_40640700/20833555
数据文件:https://download.csdn.net/download/weixin_40640700/20833525

  • 1
    点赞
  • 41
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值