scanpy下载数据慢或者失败问题解决

我一直以来都有这个问题,像

sc.dataset.paul15()
sc.datasets.pbmc3k_processed() 
sc.datasets.pbmc68k_reduced()

这些数据集其实我都是下载失败的,尤其是在jupyter中运行,基本上没有成功过,所以可以采取本地导入的方式

paul15数据集
复制以下网址到浏览器
http://falexwolf.de/data/paul15.h5
然后下载,保存到本地某个文件夹中,这里在浏览器中下载其实非常快的,导入的时候用以下代码

import scanpy as sc 
import h5py
import anndata as ad
filename="/Users/xiaokangyu/scanpy_dataset/paul15/paul15.h5"
with h5py.File(filename, 'r') as f:
    X = f['data.debatched'][()]
    gene_names = f['data.debatched_rownames'][()].astype(str)
    cell_names = f['data.debatched_colnames'][()].astype(str)
    clusters = f['cluster.id'][()].flatten().astype(int)
    infogenes_names = f['info.genes_strings'][()].astype(str)
# each row has to correspond to a observation, therefore transpose
adata = ad.AnnData(X.transpose(), dtype=X.dtype)
adata.var_names = gene_names
adata.row_names = cell_names
# names reflecting the cell type identifications from the paper
cell_type = 6 * ['Ery']
cell_type += 'MEP Mk GMP GMP DC Baso Baso Mo Mo Neu Neu Eos Lymph'.split()
adata.obs['paul15_clusters'] = [f'{i}{cell_type[i-1]}' for i in clusters]
# make string annotations categorical (optional)
#_utils.sanitize_anndata(adata)
# just keep the first of the two equivalent names per gene
adata.var_names = [gn.split(';')[0] for gn in adata.var_names]
# remove 10 corrupted gene names
infogenes_names = np.intersect1d(infogenes_names, adata.var_names)
# restrict data array to the 3461 informative genes
adata = adata[:, infogenes_names]
# usually we'd set the root cell to an arbitrary cell in the MEP cluster
# adata.uns['iroot'] = np.flatnonzero(adata.obs['paul15_clusters'] == '7MEP')[0]
# here, set the root cell as in Haghverdi et al. (2016)
# note that other than in Matlab/R, counting starts at 0
adata.uns['iroot'] = 840
print(adata)

上面获得的adata,与
adata=sc.datasets.paul15()得到的adata是一样的,同理可以对其他数据集这样操作

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值