CellRanger单细胞基因表达分析基础流程

CellRanger 是10X genomic 公司开发的分析单细胞测序数据的软件。可以从这里下载软件和参考基因组数据。根据样本处理和测序的方案不同,流程方案有一定的差异,本文是基因表达分析的基本代码,适用于one sample,one GEM well,One Flowcell 的基因表达分析。

1. cellranger mkfastq 生成fastq文件

cellranger mkfastq --id=tiny-bcl \
--run=/home/zheng/test/single_cell/cellranger-tiny-bcl-1.2.0 \
--csv=cellranger-tiny-bcl-simple-1.2.0.csv

# 或者用--samplesheet 参数
cellranger mkfastq --id=tiny-bcl \
--run=/home/zheng/test/single_cell/cellranger-tiny-bcl-1.2.0 \
--samplesheet=/home/zheng/test/single_cell/cellranger-tiny-bcl-samplesheet-1.2.0.csv

主要参数说明:

--run(Required) The path of Illumina BCL run folder.
--id(Optional; defaults to the name of the flowcell referred to by --run) Name of the folder created by mkfastq.
--samplesheet

(Optional) Path to an Illumina Experiment Manager-compatible sample sheet which contains 10x sample index names (e.g., SI-GA-A1 or SI-TT-A12) in the sample index column. All other information, such as sample names and lanes, should be in the sample sheet.

--csv(Optional) Path to a simple CSV with lane, sample, and index columns, which describe the way to demultiplex the flowcell. The index column should contain a 10x sample dual-index name (e.g., SI-TT-A12). This is an alternative to the Illumina IEM sample sheet, and will be ignored if --samplesheet is specified.

2. cellranger count 基因表达定量

cellranger count --id=sample345 \
                   --transcriptome=/opt/refdata-gex-GRCh38-2020-A \
                   --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \
                   --sample=mysample \
                   --expect-cells=1000 \
                   --localcores=8 \
                   --localmem=64

这一步要指定已构建索引和含有注释的参考基因组。对于one sample,one GEM well,One Flowcell 的样本,-fastqs 参数就是fastq文件所在的文件夹,主要包括

  • I1: Sample index read (optional)
  • R1: Read 1
  • R2: Read 2

详细参数说明:

ArgumentDescription
--idA unique run ID string: e.g. sample345
--fastqsEither:
Path of the fastq_path folder generated by cellranger mkfastq
e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path. This contains a directory hierarchy that cellranger count will automatically traverse.
- OR -
Any folder containing fastq files, for example if the fastq files were generated by a service provider and delivered outside the context of the mkfastq output directory structure. 
Can take multiple comma-separated paths, which is helpful if the same library was sequenced on multiple flowcells. 
Doing this will treat all reads from the library, across flowcells, as one sample. 
If you have multiple libraries for the sample, you will need to run cellranger count on them individually, and then combine them with cellranger aggr. 
This argument cannot be used when performing Feature Barcode analysis; use --libraries instead.
--librariesPath to a libraries.csv file declaring FASTQ paths and library types of input libraries. Required for gene expression + feature barcode analysis. See Feature Barcode Analysis for details. When using this argument, --fastqs and --samplemust not be passed. 
This argument should not be used when performing gene expression-only analysis; use --fastqs instead.
--sampleSample name as specified in the sample sheet supplied to cellranger mkfastq. 
Can take multiple comma-separated values, which is helpful if the same library was sequenced on multiple flowcells and the sample name used (and therefore fastq file prefix) is not identical between them. 
Doing this will treat all reads from the library, across flowcells, as one sample. 
If you have multiple libraries for the sample, you will need to run cellranger count on them individually, and then combine them with cellranger aggr. 
Allowable characters in sample names are letters, numbers, hyphens, and underscores. 
--transcriptomePath to the Cell Ranger compatible transcriptome reference e.g.
  • For a human-only sample, use /opt/refdata-gex-GRCh38-2020-A
  • For a human and mouse mixture sample, use /opt/refdata-gex-GRCh38-and-mm10-2020-A
--feature-refPath to a Feature Reference CSV file declaring the Feature Barcode reagents in use in the experiment. Required for Feature Barcode analysis. See Feature Barcode Reference for details on how to construct the feature reference.
--target-panelPath to a Target Panel CSV file declaring the target panel used, if any. Required for Targeted Gene Expression analysis. See Targeted Gene Expression Analysisfor details
--no-target-umi-filter(optional) Add this flag to disable targeted UMI filtering. See Targeted Algorithmsfor details.
--expect-cells(optional, recommended) Expected number of recovered cells. Default: 3,000 cells.
--force-cells(optional) Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
--include-introns(optional) Add this flag to count reads mapping to intronic regions. This may improve sensitivity for samples with a significant amount of pre-mRNA molecules, such as nuclei. This flag should be used instead of the deprecated pre-mRNA reference. 
--nosecondary(optional) Add this flag to skip secondary analysis of the feature-barcode matrix (dimensionality reduction, clustering and visualization). Set this if you plan to use cellranger reanalyze or your own custom analysis.
--no-bam(optional). Do not generate a bam file. Default: false.
--no-librariesProceed with processing using a --feature-ref but no feature-barcode data specified with the --libraries flag.
--chemistry(optional) Assay configuration. NOTE: by default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection. Select one of:
  • auto for auto-detection (default),
  • threeprime for Single Cell 3′,
  • fiveprime for Single Cell 5′,
  • SC3Pv2 for Single Cell 3′ v2,
  • SC3Pv3 for Single Cell 3′ v3,
  • SC3Pv3LT for Single Cell 3′ v3 LT,
  • SC3Pv3HT for Single Cell 3′ v3 HT,
  • SC5P-PE for Single Cell 5′ paired-end (both R1 and R2 are used for alignment),
  • SC5P-R2 for Single Cell 5′ R2-only (where only R2 is used for alignment).
  • SC3Pv1 for Single Cell 3′ v1. NOTE: this mode cannot be auto-detected. It must be set explicitly with this option.
--r1-length(optional) Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3′ v2 or Single Cell 5′. This and --r2-length are useful for determining the optimal read length for sequencing.
--r2-length(optional) Hard-trim the input R2 sequence to this length.
--lanes(optional) Lanes associated with this sample
--localcoresRestricts cellranger to use specified number of cores to execute pipeline stages. By default, cellranger will use all of the cores available on your system.
--localmemRestricts cellranger to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger will use 90% of the memory available on your system.

对于One Sample, One GEM Well, One Flowcell 只需要以上两步。

3.  cellranger aggr

In this case, demultiplex the data from the sequencing run with cellranger mkfastq, then run the libraries from each GEM well through a separate instance of cellranger count. Then you can perform a combined analysis using cellranger aggr.

cellranger aggr --id=AGG123 \
                  --csv=AGG123_libraries.csv \
                  --normalize=mapped

详细参数:

ArgumentDescription
--id=IDA unique run ID string: e.g. AGG123
--csv=CSVPath of a CSV file containing a list of cellranger count outputs (see Setting up a CSV).
--normalize=MODE(Optional) String specifying how to normalize depth across the input libraries. Valid values: mapped (default), or none (see Depth Normalization).
--nosecondary(Optional) Add this flag to skip secondary analysis which includes dimensionality reduction, clustering and visualization. This is applicable if you plan to use cellranger reanalyze or your own custom analysis.

4. cellranger multi

cellranger multi is used to analyze Cell Multiplexing data. It inputs FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger multi pipeline also supports the analysis of Feature Barcode data.

参考:

Cell Multiplexing with cellranger multi -Software -Single Cell Gene Expression -Official 10x Genomics Support

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值