单细胞聚类方法

eynoZzzzc

已于 2022-05-31 09:01:25 修改

阅读量2.5k

点赞数

分类专栏：生信文章标签：聚类机器学习算法

于 2022-03-01 17:04:28 首次发布

本文链接：https://blog.csdn.net/qq_44185558/article/details/123209167

版权

生信专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Partitioning-based clustering

kmeans：K均值
论文链接

res <- kmeans(t(data), centers = 9)
adjustedRandIndex(res$cluster, meta$label)
plot(res$centers, col = topo.colors(4))

tsne_out <- Rtsne(data)
plot(tsne_out$Y, col = topo.colors(4))

SAIC：在聚类迭代过程中结合k-means和ANOVA

SCUBA：kmeans；使用gap statistics 识别bifurcation events

scVDMC : single-cell variance-driven multi-task clustering

pcaReduce：
论文链接

library(pcaReduce)
res <- PCAreduce(t(data),
                 nbt = 1,
                 q = 7,
                 method = "S")
res[[1]]
adjustedRandIndex(res[[1]][, 1], meta$label)

k-medoids

res <- pamk(data = t(data), krange = 7)
adjustedRandIndex(res$pamobject$clustering, meta$label)

层次聚类

BackSPIN：two-way biclustering algorithm；

cellTree：构建最小生成树；

CIDR：缺失值填补
论文链接

#rows correspond to features (genes, transcripts, etc) and the columns correspond to cells
library(cidr)
load("/Biase.Rdata")

cellType <- factor(meta$label)
types <- levels(cellType)

scols <-
  c("red",
    "blue",
    "green",
    "brown",
    "pink",
    "purple",
    "darkgreen",
    "grey")
cols <- rep(NA, length(cellType))
for (i in 1:length(cols)) {
  cols[i] <- scols[which(types == cellType[i])]
}

#' @param nPC number of principal coordinates (nPC),by default 4.
#' @param nCluster the number of clusters;
#'
sdata <- as.matrix(data)
sdata <- scDataConstructor(sdata)#????scData??
sdata <- determineDropoutCandidates(sdata)#ȷ??dropout??ѡ????
sdata <- wThreshold(sdata)  #????Ȩֵ
sdata <- scDissim(sdata)   #????dissimilarity????

sdata <- scPCA(sdata)  #pcoa
sdata <- nPC(sdata)    #ȷ??????????
nPC <- sdata@nPC  #ȷ??npc??

nCluster(sdata)    #plot????????

sdata <- scCluster(sdata, nPC = nPC)     #cidr???ξ???
adjustedRandIndex(sdata@clusters, meta$label)

sdata@nCluster
plot(
  sdata@PC[, c(1, 2)],
  col = cols,
  pch = sdata@clusters,
  main = "CIDR",
  xlab = "PC1",
  ylab = "PC2"
)

RCA: reference component analysis
论文链接

混合模型（mixture models）

GMM: Gaussian mixture model

pc_res <-  prcomp(t(data))$x
tmp_pca_mat = pc_res[, 1:10]
res <- Mclust(tmp_pca_mat, G = 2:10)
clusterid <-  apply(res$z, 1, which.max)
adjustedRandIndex(clusterid, meta$label)

TSCAN：使用GMM和MST发现pseudo time ordering
论文链接

Graph-based clustering

TCC：Transcript compatibility counts；

构建affinity matrix；
计算Jensen-Shannon距离

SIMLR：从单细胞 RNA-seq 数据学习相似度量以执行降维、聚类和可视化
论文链接

library(SIMLR)
data <- CreateSeuratObject(data)
ElbowPlot(data)

SIMLR_res <- SIMLR(data, c = 3)#聚类簇数
adjustedRandIndex(SIMLR_res$y$cluster, meta$label)

plot(SIMLR_res$ydata,
     col = c(topo.colors(7))[meta$label],
     pch = 20)

heatmap(SIMLR_res$S)

SNN-cliq：clique detection ;

①计算初始数据点之间相似性（欧氏距离）；

②使用相似矩阵，列出每个数据点的KNN；

③基于每两个数据点的共享邻居（SNN）计算二级相似矩阵；

④构建两个点的SNN图，节点代表数据点，边代表数据点之间的相似性

Louvain：使用社区检测算法进行聚类，首先根据 scRNA-seq 数据构建网络，其中结点代表细
胞，边代表细胞间的相似性，随后使用社区检测算法对网络进行划分，聚类结果很大程度上取
决于相似网络的构建。
论文链接

Density-based clustering

DBSCAN：

①随机从一个未被访问过的数据点x开始，以eps为半径搜索范围内所有邻域点；

②如果x点在该邻域内有足够数量的点，数量大于等于minPts，则聚类过程开始，并且当前数据点成为新簇中的第一个核心点。否则，该点将被标记为噪声。该点都会被标记为“已访问”；

③新簇中的每个核心点x，它的eps距离邻域内的点会归为同簇。eps邻域内的所有点都属于同一个簇，然后对才添加到簇中的所有新点重复上述过程。

④重复步骤2和3两个过程，直到确定了簇中的所有点才停止，即访问和标记了聚类的eps邻域内的所有点。

⑤当完成了这个簇的划分，就开始处理新的未访问的点，发现新的簇或者是噪声。重复上述过程，直到所有点被标记为已访问才停止。这样就完成了对所有点的聚类过程。

library(dbscan)
kNNdistplot(t(data), k = 5)
res <- dbscan::dbscan(t(data), minPts = 5, eps = 340)

res$cluster
adjustedRandIndex(res$cluster, meta$label)

GiniClust: discover rare subpopulation
论文链接

Monocle
论文链接

density peak clustering: 考虑数据点之间的距离，而非密度阈值，假设簇中心是簇中数据点密度的局部最大值

神经网络

SOM: competitive learning for clustering ; 随机梯度下降；sensitive to parameter tuning(learning rate)

SCRAT：single-cell R-analysis tools ; 可视化2D热图，表示单细胞基因之间的相关性

SOMSC：压缩高维基因表达数据为2维，用于cellular state transition identification和pseudotemporal ordering of cells

Ensemble clustering(consensus clustering)

SC3
论文链接

library(SC3)
sce <- SingleCellExperiment(assays = list(counts = as.matrix(data),
                                          logcounts = log2(as.matrix(data) + 1)))

# define feature names in feature_symbol column
rowData(sce)$feature_symbol <- rownames(sce)
# remove features with duplicated names
sce <- sce[!duplicated(rowData(sce)$feature_symbol),]

sce <- runPCA(sce)

res <- sc3(sce, ks = 3)
res <- sc3(sce, k_estimator = T)

sc3_plot_consensus(res, k = 3)
sc3_plot_silhouette(res, 10)

adjustedRandIndex(res$sc3_3_clusters, meta$label)

plotPCA(res, colour_by = "sc3_3_clusters")

基于随机森林

RAFSIL：首先对数据进行特征构建，随后学习细胞间相似度。可用于典型的探索性数据分析任务，如降维、可视化、聚类。
论文链接

library(RAFSIL)
cluster_result <- RAFSIL(data = embedding_data,
                         NumC = 6,
                         method = "RAFSIL1")$lab
cluster_result <- RAFSIL(data = t(embedding_data),
                         NumC = 6,
                         method = "RAFSIL2")$lab
final_ARI <- adjustedRandIndex(cluster_result, label)
print(final_ARI)

其他

LAK
论文链接

library(mclust)

setwd("/LAK-master")
source("LAK.R")

#Biase <-  readRDS("Single Cell Data/biase.rds")
yan <-
  readRDS("/yan.rds")
m <- assays(yan)[[1]][, -(50:56)]
LAK_ann <- LAK(m, 3)

yan_ann <- colData(yan)$cell_type1[-(50:56)]
yan_ann_numeric <- c()
id <- names(table(yan_ann))
for (i  in 1:length(yan_ann)) {
  for (j in 1:length(id)) {
    if (yan_ann[i] == id[j]) {
      yan_ann_numeric <- c(yan_ann_numeric, j)
      break
    }
  }
}
adjustedRandIndex(LAK_ann[[1]]$Cs, yan_ann_numeric)

eynoZzzzc

关注

0
点赞
踩
13

收藏

觉得还不错? 一键收藏
打赏
0
评论
单细胞聚类方法

Partitioning-based clusteringkmeans：K均值论文链接res <- kmeans(t(data), centers = 9)adjustedRandIndex(res$cluster, meta$label)plot(res$centers, col = topo.colors(4))tsne_out <- Rtsne(data)plot(tsne_out$Y, col = topo.colors(4))SAIC：在聚类迭代过程中结合k-mea
复制链接

扫一扫