Seurat -- 数据集的整合

65 篇文章 0 订阅
20 篇文章 0 订阅

brief

这里主要根据seurat的教程走的,描述了多个单细胞数据集的整合,其中数据集的integration并不是简单的数据集的merge。
前者包括元信息的整合,数据集之间的批次矫正,后者仅仅是对数据表的拼接,后续直接renormalization即可。
同时这里描述的流程仅仅包括同类型的scRNA-seq测序数据,像scRNA-seq与scATAC-seq等多模态数据的整合暂未涉及。
此外像其他的单细胞数据集整合工具,例如harmony此处也没涉及。

library(dplyr)
library(Seurat)
library(patchwork)
library(sctransform)
library(ggplot2)
# devtools::install_github('satijalab/seurat-data')
library(SeuratData)

rm(list=ls())

# 获取测试数据集
# For convenience, we distribute this dataset through our SeuratData package.
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")

Performing integration on datasets normalized with LogNormalize

# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")

# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

# select features that are repeatedly variable across datasets for integration  <--repeatedly仅代表部分细胞在表达
# This function ranks features by the number of datasets they are deemed variable in, breaking ties by the median variable feature rank across datasets.
# It returns the top scoring features by this ranking.
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)

# Returns a Seurat object with a new integrated Assay. 
# this command creates an 'integrated' data assay
# If normalization.method = "LogNormalize", the integrated data is returned to the data slot and can be treated as log-normalized, corrected data.
# If normalization.method = "SCT", the integrated data is returned to the scale.data slot and can be treated as centered, corrected Pearson residuals
immune.combined <- IntegrateData(anchorset = immune.anchors)

str(immune.combined) # 保留了2000个feature在Assays$integrated@data下面
# 确实看到了表达数据被修改了,至于是不是修正我是不敢说的
immune.combined@assays$integrated@data[immune.combined@assays$integrated@var.features,]
immune.combined@assays$RNA@data[immune.combined@assays$integrated@var.features,]


# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"

# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
str(immune.combined) # 每个基因在所有细胞中进行了 cale

immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
================================================================================
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2

# To identify canonical cell type marker genes that are conserved across conditions, we provide the FindConservedMarkers() function.
# For performing differential expression after integration, we switch back to the original data
DefaultAssay(immune.combined) <- "RNA"
nk.markers <- FindConservedMarkers(immune.combined, ident.1 = 6, grouping.var = "stim", verbose = FALSE)
head(nk.markers)

  • 整合前和整合后 anchors的数值变化
    在这里插入图片描述
    在这里插入图片描述

  • 整合前的数据以及LogNormalization的数据一直存放在RNA@data@x下面

在这里插入图片描述

  • 整合后的数据存放在integration@data@x
    在这里插入图片描述

Performing integration on datasets normalized with SCTransform

# Performing integration on datasets normalized with SCTransform
# install glmGamPoi
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("glmGamPoi")
# install sctransform from Github
install.packages("sctransform")

# load dataset
LoadData("ifnb")

# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# SCTransform只接受单个的seurat object
ctrl <- ifnb.list[["CTRL"]]
stim <- ifnb.list[["STIM"]]

ctrl.sct <- SCTransform(ctrl, vst.flavor = "v2", verbose = FALSE) %>%
  RunPCA(npcs = 30, verbose = FALSE)
stim <- SCTransform(stim, vst.flavor = "v2", verbose = FALSE) %>%
  RunPCA(npcs = 30, verbose = FALSE)

ifnb.list <- list(ctrl = ctrl, stim = stim)
# selecting a list of informative features using SelectIntegrationFeatures()
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)

# To perform integration using the pearson residuals calculated above, we use the PrepSCTIntegration() function
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)

# To integrate the two datasets, we use the FindIntegrationAnchors() to find anchors 
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
                                         anchor.features = features)
                                         
# and use these anchors to integrate the two datasets together with IntegrateData()
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT")

# Perform an integrated analysis
immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30, verbose = FALSE)
immune.combined.sct <- FindNeighbors(immune.combined.sct, reduction = "pca", dims = 1:30)
immune.combined.sct <- FindClusters(immune.combined.sct, resolution = 0.3)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值