数据来源
单细胞数据来自2023年的Cell人胎脑,"Spatiotemporal transcriptome atlas reveals the regional specification of the developing human brain",选取了其中14PCW的部分进行迁移整合;
空间转录组数据来自华大Stereo-seq测序的下机GEM/gef原始文件,选取了其中一张1*1的小芯片进行测试。所有的数据都需要转成AnnData格式,即适用于scanpy分析的h5ad格式。格式转换方法可以参考Stereopy官方教程(Input & Output - Stereopy),这里不再详细说明。
环境配置
首先需要配置好环境,包的下载来自GitHub - mexchy1000/CellDART: domain adaptation of spatial and single-cell transcriptome
pip install git+https://github.com/mexchy1000/CellDART.git
Github项目里也有相应的教程,但直接套用Stereo-seq数据会出现一些问题,但可以作为参考
CellDART/CellDART_example_mousebrain_markers.ipynb at master · mexchy1000/CellDART · GitHub
数据和包导入
配置好环境后具体如下,首先导入需要的包
import scanpy as sc
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from CellDART import da_cellfraction
from CellDART.utils import random_mix
from sklearn.manifold import TSNE
分别导入单细胞数据adata和空间转录组数据adata_spatial
adata = sc.read('/data/work/Cell_fetal_brain_singleCell/14.h5ad')
adata
AnnData object with n_obs × n_vars = 35175 × 36601 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'region', 'week', 'celltype' var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
adata_spatial = sc.read('/data/work/scanpy/D03657A1_scanpy_out.h5ad')
adata_spatial
AnnDat