cellranger安装以及使用

最新推荐文章于 2025-03-06 21:28:19 发布

m0_59974475

最新推荐文章于 2025-03-06 21:28:19 发布

阅读量6.1k

点赞数 14

分类专栏：单细胞文章标签：服务器 linux

本文链接：https://blog.csdn.net/m0_59974475/article/details/138108362

版权

单细胞专栏收录该内容

1 篇文章

订阅专栏

cellranger安装

1.1 为避免造成环境污染，创建隔离环境进行分析

#激活conda
source ~/.conda_init

#创建10X环境
conda create -n 10X
conda activate 10X

1.2 下载CellRanger

进入Downloads -Software -Single Cell Gene Expression -Official 10x Genomics Support注册，获得下载地址

#安装cellranger
curl -o cellranger-8.0.0.tar.gz "https://cf.10xgenomics.com/releases/cell-exp/cellranger-8.0.0.tar.gz?Expires=1713879449&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=W63p465reGvT7V3RlwtmCwKKFj4rlFx13BqglWz9plAPjMqLvgdGvTKcYAvrsBxS2tCLReOOB9hbXJESsG2TjOTIBcu~7WRLXhK5TphIkE64gsGjd22QZgVhc61rVlXXet4Yfs0z9DD5gtENpylidafi4jRi-NR4oemh11eUH-WIs9BNkNK2IUxo8bfr3qGBN-BHkYOy5cnGXDQPbUp8TObiihnNs7ebTMGH6m6ALTD9yd0AMcZaftKu7JawGG9RdDgdUrwp4WdrDqJ~PEY7Hj2FyA9SPicLYCTFnGMj~pQHt66KB-gDFIJCiuxrKI2dkhDM-HkvA8OMbE0vc5JbBA__"

1.3 检查md5sum值是否完整

$ md5sum cellranger-8.0.0.tar.gz

1.4 cellanger安装

#解压
tar -zxvf cellranger-8.0.0.tar.gz

解压完就已经安装完毕。

1.5使用cellranger

虽然安装完毕，但是现在要运行cellranger的话需要在终端输入cellranger的整个路径。

将cellranger的路径加入$PATH，这样只需要在终端输入cellranger就可以执行命令。

将cellranger加入 $PATH有两种方式，分为临时和永久接下来以此介绍

1.5.1 临时加载环境变量：

将路径加入$PATH：

#export PATH=自己的路径:$PATH
export PATH=/public1/home/qinql/jwy/biosoft/cellranger-8.0.0:$PATH

现在终端输入cellranger就可以执行命令了。

操作仅临时有用，如果电脑重启后就实效了，得重新加入路径。

1.5.2 永久加载环境变量：

自动执行这个命令，需要将“export PATH=自己的路径:$PATH”这个命令加入.bashrc文件。

vim编辑器打开.bashrc文件：

vim .bashrc

在.bashrc文件最后加入“export PATH=自己的路径:$PATH”

source .bashrc立即生效：

source .bashrc

但是由于实际因素，故选择使用临时加入 $PATH。

参考基因组构建

参考文件Build a Custom Reference (cellranger mkref) -Software -Single Cell Gene Expression -Official 10x Genomics Support

参考基因组 FASTA 和 GTF 文件（最好Ensembl 数据库，但AcMNPV只能找到NCBI数据库中）。请注意，需要 GTF 文件，而不支持 GFF 文件。

gft文件过滤

# mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...]
cellranger mkgtf genomic.gtf AcMNPV.filtered.gtf --attribute=gene_biotype:protein_coding

输出 AcMNPV.filtered.gtf 文件将在下一步中使用。这一步很快。

但是由于病毒基因组太小了，且注释结果只有CDS没有exon，所以曲线救国，将CDS整体替换为exon，也不进行过滤了，直接下一步。

参考基因组

Usage:
    mkref
        --genome=NAME ...
        --fasta=PATH ...
        --genes=PATH ...
        [options]
    mkref -h | --help | --version
 
Arguments:
    genome  #输出文件夹      Unique genome name(s), used to name output folder
                            [a-zA-Z0-9_-]+. Specify multiple genomes by
                            specifying the --genome argument multiple times; the
                            output folder will be <name1>_and_<name2>.
    fasta   #FASTA参考基因组绝对路径 
                            Path(s) to FASTA file containing your genome reference.
                            Specify multiple genomes by specifying the --fasta
                            argument multiple times.
    genes   #.filtered.gtf注释文件绝对路径
                            Path(s) to genes GTF file(S) containing annotated genes
                            for your genome reference. Specify multiple genomes
                            by specifying the --genes argument multiple times.
 
Options:
    --nthreads=<num>    This option is currently ignored due to a bug, and will be re-enabled
                          in the next Cell Ranger release.
    --memgb=<num>       Maximum memory (GB) used when aligning reads with STAR.
                            Defaults to 16.
    --ref-version=<str> Optional reference version string to include with
                            reference.
    -h --help           Show this message.
    --version           Show version.

cellranger mkref --genome=AcMNPV_genome --fasta=GCA_000838485.1_ViralProj14023_genomic.fna --genes=exongenomic.gtf

生成文件的结构

#生成文件格式
tree ovis_aries
ovis_aries/
├── fasta
│   ├── genome.fa
│   └── genome.fa.fai
├── genes
│   └── genes.gtf.gz
├── reference.json
└── star
    ├── chrLength.txt
    ├── chrNameLength.txt
    ├── chrName.txt
    ├── chrStart.txt
    ├── exonGeTrInfo.tab
    ├── exonInfo.tab
    ├── geneInfo.tab
    ├── Genome
    ├── genomeParameters.txt
    ├── SA
    ├── SAindex
    ├── sjdbInfo.txt
    ├── sjdbList.fromGTF.out.tab
    ├── sjdbList.out.tab
    └── transcriptInfo.tab

cellranger count 定量

参考文章Running Cell Ranger count - Official 10x Genomics Support

cellranger-count
Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well

USAGE:
    cellranger count [FLAGS] [OPTIONS] --id <ID> --transcriptome <PATH>

FLAGS:
          --no-bam                  Do not generate a bam file
          --nosecondary             Disable secondary analysis, e.g. clustering. Optional
          --include-introns         Include intronic reads in count
          --no-libraries            Proceed with processing using a --feature-ref but no Feature Barcode libraries
                                    specified with the 'libraries' flag
          --no-target-umi-filter    Turn off the target UMI filtering subpipeline. Only applies when --target-panel is
                            used
          --dry                     Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop
          --disable-ui              Do not serve the web UI
          --noexit                  Keep web UI running after pipestance completes or fails
          --nopreflight             Skip preflight checks
      -h, --help                    Prints help information

由于有四个样本，所以最终写了一个for循环完成

x=("24h_CK" "Ac_NPV_0h" "CK_0h")
for sample_id in "${x[@]}"; do
  fastq_path=".~/jwy/snrna-seq/$sample_id"  
  cellranger count --id $sample_id --fastqs ~/jwy/snrna-seq/$sample_id --transcriptome AcMNPV_genome --create-bam false --nosecondary
done

所需要时间很长

结果输出

├── analysis
├── cloupe.cloupe
├── filtered_feature_bc_matrix
├── filtered_feature_bc_matrix.h5
├── metrics_summary.csv
├── molecule_info.h5
├── possorted_genome_bam.bam
├── possorted_genome_bam.bam.bai
├── raw_feature_bc_matrix
├── raw_feature_bc_matrix.h5
└── web_summary.html