探序基因肿瘤研究院 整理
比如宫颈癌单细胞文章:Multiomic analysis of cervical squamous cell carcinoma identifies cellular ecosystems with biological and clinical relevance,原文地址为:https://www.nature.com/articles/s41588-023-01570-0,
查看方法部分:
Processed sequencing data are available at Science Data Bank (https://doi.org/10.57760/sciencedb.11624)
点击后,发现文件为:
下载h5ad文件到服务器后,可以使用python将h5ad读入,再输出成文件文件,载入R中变成seurat结构。
python代码如下:
import scipy.sparse as sparse
import scipy.io as sio
import scipy.stats as stats
import numpy as np
import scanpy as sc
import os
all_data=sc.read_h5ad("all_scRNAseq.h5ad")
cellinfo=all_data.obs
geneinfo=all_data.var
mtx=all_data.X.T
cellinfo.to_csv("cellinfo.csv")
geneinfo.to_csv("geneinfo.csv")
sio.mmwrite("sparse_matrix.mtx",mtx)
此时sparse_matrix.mtx为稀疏矩阵的形式。
输入mtx,打印:
>>> mtx
<29225x163964 sparse matrix of type '<class 'numpy.float32'>'
with 326157024 stored elements in Compressed Sparse Column format>
然后到R语言中:
counts=Matrix::readMM(file = "xxx/sparse_matrix.mtx")
再把基因名和细胞名,命名到这个count矩阵中,就好了