cluster soil traits

conniecl

于 2021-02-25 22:47:37 发布

阅读量170

点赞数

分类专栏：聚类

本文链接：https://blog.csdn.net/conniecl/article/details/114098020

版权

聚类专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Find a suitable cluster method

In GWAS analysis, both highly positve and negative related traits will return us the similar results, such as below (a vs b: 0.996, a vs c: -0.976),what we want to do is randomly remain one from (a, b, c).

a	b	c

But just image a simple situation as below when we do clustering from data both with high positive and negative correlation coefficient:

	a	b	c
a	1	0.9	0.9
b	0.9	1	0.9
c	0.9	0.9	1

	d	e	f
d	1	0.9	-0.9
e	0.9	1	-0.9
f	-0.9	-0.9	1

x<-c(0.9,0.9,0.9)
dist(x)
  1 2
2 0  
3 0 0

y<-c(0.9,0.9,-0.9)
dist(y)
    1   2
2 0.0    
3 1.8 1.8

Actually, we want (d,e,f) cluster together for the selection of GWAS traits, but the cluster method based on distance cannot consider this.
So, the first thing we need to do is use the absolute value. Then, how to find the most suitable cluster method?
Luckily, this course give us the solution: https://www.datanovia.com/en/courses/cluster-validation-essentials/

1. choosing the best clustering algorithms and optimal number of clusters (Internal measures)

library(clValid)
corm<-as.matrix(read.table("pro_alti_rm.cor"))
rcorm<-abs(corm)
clmethods<-c("hierarchical","kmeans","pam") ##pam represent kmediods
intern <- clValid(rcorm, nClust = 2:30, clMethods = clmethods, validation = "internal") ## internal measures for comparing 

Optimal Scores:

             Score   Method       Clusters
Connectivity 10.3996 hierarchical 2       
Dunn          0.2259 hierarchical 29      
Silhouette    0.4277 kmeans       2     ## kmediods also with minimum silhouette in 2 clusters

So, in internal measures, suggest kmeans with 2 cluster or hierarchical with 2 or 29 cluster.

2. Clustering with hierarchical in k=29

library(pheatmap)
corm<-as.matrix(read.table("pro_alti_rm.cor"))
rcorm<-abs(corm)
t<-pheatmap(rcorm,silent = TRUE)
png(file="soil_hclu.png",height=700,width=700)
pheatmap(rcorm,show_rownames=FALSE,show_colnames=FALSE,cutree_cols=29,legend=FALSE,na_col=NA,border_color=NA,annotation_col =data.frame(type=cutree(t$tree_row,k=29)),annotation_legend = FALSE,annotation_names_col=FALSE)
dev.off()

hclu with k=29 As we can see, the clusters of diagonal is what we want.

3. randomly select one trait in a cluster

clu<-data.frame(trait=colnames(rcorm),type=cutree(t$tree_col,k=29))
for (i in 1:29)  # randomly select one trait in a cluster
 {
     print (as.character(sample(clu[which(clu$type==i),1],1)))
 }

4. combine the manhattan plot

#!usr/bin/python3
import math
from PIL import Image

column = 5
width = 3600
height = 2400
size = (3600, 2400)

file="random_k29.trait"
with open(file) as f:
    list_im=f.read().splitlines()

imgs = [Image.open(i) for i in list_im]

row_num = math.ceil(len(imgs)/column)
target = Image.new('RGB', (width*column, height*row_num))
for i in range(len(list_im)):
    if i % column == 0:
        end = len(list_im) if i + column > len(list_im) else i + column
        for col, image in enumerate(imgs[i:i+column]):
            target.paste(image, (width*col, height*(i//column),
                                 width*(col + 1), height*(i//column + 1)))
target.show()
target.save('random_k29_combine.png')

manhattan plot

5. Stability measures

stab <- clValid(rcorm, nClust = 2:30, clMethods = clmethods, validation = "stability")
summary(stab)

Optimal Scores:

    Score  Method Clusters
APN 0.0000 kmeans 2       
AD  1.2498 pam    30      
ADM 0.0000 kmeans 2       
FOM 0.0735 pam    30

Suggest kmediods with 30 or kmeans with 2.

6. k-mediods analysis

library(cluster)
cor_pam<-pam(rcorm,30)
rownames(cor_pam$medoids) #centroid trait of k-medoids
layout(matrix(c(1,2),1,2)) #show two plot in one page
plot(pam(rcorm,30))
layout(matrix(1))

在这里插入图片描述

conniecl

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
cluster soil traits

Find a suitable cluster methodIn GWAS analysis, both highly positve and negative related traits will return us the similar results, such as below (a vs b: 0.996, a vs c: -0.976),what we want to do is randomly remain one from (a, b, c).Trait “a”:Trait “b
复制链接

扫一扫

专栏目录