scanpy下载数据慢或者失败问题解决

最新推荐文章于 2024-06-02 18:23:38 发布

我的心永远是冰冰哒

最新推荐文章于 2024-06-02 18:23:38 发布

阅读量687

点赞数 1

分类专栏： python 文章标签： jupyter

本文链接：https://blog.csdn.net/qq_45759229/article/details/125049299

版权

python 专栏收录该内容

101 篇文章 3 订阅

订阅专栏

我一直以来都有这个问题，像

sc.dataset.paul15()
sc.datasets.pbmc3k_processed() 
sc.datasets.pbmc68k_reduced()

这些数据集其实我都是下载失败的，尤其是在jupyter中运行，基本上没有成功过，所以可以采取本地导入的方式

paul15数据集
复制以下网址到浏览器
http://falexwolf.de/data/paul15.h5
然后下载，保存到本地某个文件夹中，这里在浏览器中下载其实非常快的，导入的时候用以下代码

import scanpy as sc 
import h5py
import anndata as ad
filename="/Users/xiaokangyu/scanpy_dataset/paul15/paul15.h5"
with h5py.File(filename, 'r') as f:
    X = f['data.debatched'][()]
    gene_names = f['data.debatched_rownames'][()].astype(str)
    cell_names = f['data.debatched_colnames'][()].astype(str)
    clusters = f['cluster.id'][()].flatten().astype(int)
    infogenes_names = f['info.genes_strings'][()].astype(str)
# each row has to correspond to a observation, therefore transpose
adata = ad.AnnData(X.transpose(), dtype=X.dtype)
adata.var_names = gene_names
adata.row_names = cell_names
# names reflecting the cell type identifications from the paper
cell_type = 6 * ['Ery']
cell_type += 'MEP Mk GMP GMP DC Baso Baso Mo Mo Neu Neu Eos Lymph'.split()
adata.obs['paul15_clusters'] = [f'{i}{cell_type[i-1]}' for i in clusters]
# make string annotations categorical (optional)
#_utils.sanitize_anndata(adata)
# just keep the first of the two equivalent names per gene
adata.var_names = [gn.split(';')[0] for gn in adata.var_names]
# remove 10 corrupted gene names
infogenes_names = np.intersect1d(infogenes_names, adata.var_names)
# restrict data array to the 3461 informative genes
adata = adata[:, infogenes_names]
# usually we'd set the root cell to an arbitrary cell in the MEP cluster
# adata.uns['iroot'] = np.flatnonzero(adata.obs['paul15_clusters'] == '7MEP')[0]
# here, set the root cell as in Haghverdi et al. (2016)
# note that other than in Matlab/R, counting starts at 0
adata.uns['iroot'] = 840
print(adata)

上面获得的adata,与
adata=sc.datasets.paul15()得到的adata是一样的，同理可以对其他数据集这样操作

我的心永远是冰冰哒

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
scanpy下载数据慢或者失败问题解决

我一直以来都有这个问题，像sc.dataset.paul15()sc.datasets.pbmc3k_processed() sc.datasets.pbmc68k_reduced()这些数据集其实我都是下载失败的，尤其是在jupyter中运行，基本上没有成功过，所以可以采取本地导入的方式paul15数据集复制以下网址到浏览器http://falexwolf.de/data/paul15.h5然后下载，保存到本地某个文件夹中，这里在浏览器中下载其实非常快的，导入的时候用以下代码impor
复制链接

扫一扫