CellRanger单细胞基因表达分析基础流程

最新推荐文章于 2025-03-06 21:28:19 发布

qq_27390023

最新推荐文章于 2025-03-06 21:28:19 发布

阅读量2.8k

点赞数

文章标签：生物信息学

本文链接：https://blog.csdn.net/qq_27390023/article/details/121359585

版权

CellRanger 是10X genomic 公司开发的分析单细胞测序数据的软件。可以从这里下载软件和参考基因组数据。根据样本处理和测序的方案不同，流程方案有一定的差异，本文是基因表达分析的基本代码，适用于one sample，one GEM well，One Flowcell 的基因表达分析。

1. cellranger mkfastq 生成fastq文件

cellranger mkfastq --id=tiny-bcl \
--run=/home/zheng/test/single_cell/cellranger-tiny-bcl-1.2.0 \
--csv=cellranger-tiny-bcl-simple-1.2.0.csv

# 或者用--samplesheet 参数
cellranger mkfastq --id=tiny-bcl \
--run=/home/zheng/test/single_cell/cellranger-tiny-bcl-1.2.0 \
--samplesheet=/home/zheng/test/single_cell/cellranger-tiny-bcl-samplesheet-1.2.0.csv

主要参数说明：

--run	(Required) The path of Illumina BCL run folder.
--id	(Optional; defaults to the name of the flowcell referred to by `--run`) Name of the folder created by mkfastq.
--samplesheet	(Optional) Path to an Illumina Experiment Manager-compatible sample sheet which contains 10x sample index names (e.g., SI-GA-A1 or SI-TT-A12) in the sample index column. All other information, such as sample names and lanes, should be in the sample sheet.
--csv	(Optional) Path to a simple CSV with lane, sample, and index columns, which describe the way to demultiplex the flowcell. The index column should contain a 10x sample dual-index name (e.g., SI-TT-A12). This is an alternative to the Illumina IEM sample sheet, and will be ignored if `--samplesheet` is specified.

2. cellranger count 基因表达定量

cellranger count --id=sample345 \
                   --transcriptome=/opt/refdata-gex-GRCh38-2020-A \
                   --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \
                   --sample=mysample \
                   --expect-cells=1000 \
                   --localcores=8 \
                   --localmem=64

这一步要指定已构建索引和含有注释的参考基因组。对于one sample，one GEM well，One Flowcell 的样本，-fastqs 参数就是fastq文件所在的文件夹，主要包括

I1: Sample index read (optional)
R1: Read 1
R2: Read 2

详细参数说明：

Argument	Description
`--id`	A unique run ID string: e.g. `sample345`
`--fastqs`	Either: Path of the fastq_path folder generated by cellranger mkfastq e.g. `/home/jdoe/runs/HAWT7ADXX/outs/fastq_path`. This contains a directory hierarchy that cellranger count will automatically traverse. - OR - Any folder containing fastq files, for example if the fastq files were generated by a service provider and delivered outside the context of the mkfastq output directory structure. Can take multiple comma-separated paths, which is helpful if the same library was sequenced on multiple flowcells. Doing this will treat all reads from the library, across flowcells, as one sample. If you have multiple libraries for the sample, you will need to run cellranger count on them individually, and then combine them with cellranger aggr. This argument cannot be used when performing Feature Barcode analysis; use `--libraries` instead.
`--libraries`	Path to a `libraries.csv` file declaring FASTQ paths and library types of input libraries. Required for gene expression + feature barcode analysis. See Feature Barcode Analysis for details. When using this argument, `--fastqs` and `--sample`must not be passed. This argument should not be used when performing gene expression-only analysis; use `--fastqs` instead.
`--sample`	Sample name as specified in the sample sheet supplied to cellranger mkfastq. Can take multiple comma-separated values, which is helpful if the same library was sequenced on multiple flowcells and the sample name used (and therefore fastq file prefix) is not identical between them. Doing this will treat all reads from the library, across flowcells, as one sample. If you have multiple libraries for the sample, you will need to run cellranger count on them individually, and then combine them with cellranger aggr. Allowable characters in sample names are letters, numbers, hyphens, and underscores.
`--transcriptome`	Path to the Cell Ranger compatible transcriptome reference e.g. For a human-only sample, use `/opt/refdata-gex-GRCh38-2020-A` For a human and mouse mixture sample, use `/opt/refdata-gex-GRCh38-and-mm10-2020-A`
`--feature-ref`	Path to a Feature Reference CSV file declaring the Feature Barcode reagents in use in the experiment. Required for Feature Barcode analysis. See Feature Barcode Reference for details on how to construct the feature reference.
`--target-panel`	Path to a Target Panel CSV file declaring the target panel used, if any. Required for Targeted Gene Expression analysis. See Targeted Gene Expression Analysisfor details
`--no-target-umi-filter`	(optional) Add this flag to disable targeted UMI filtering. See Targeted Algorithmsfor details.
`--expect-cells`	(optional, recommended) Expected number of recovered cells. Default: 3,000 cells.
`--force-cells`	(optional) Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
`--include-introns`	(optional) Add this flag to count reads mapping to intronic regions. This may improve sensitivity for samples with a significant amount of pre-mRNA molecules, such as nuclei. This flag should be used instead of the deprecated pre-mRNA reference.
`--nosecondary`	(optional) Add this flag to skip secondary analysis of the feature-barcode matrix (dimensionality reduction, clustering and visualization). Set this if you plan to use cellranger reanalyze or your own custom analysis.
`--no-bam`	(optional). Do not generate a bam file. Default: false.
`--no-libraries`	Proceed with processing using a --feature-ref but no feature-barcode data specified with the --libraries flag.
`--chemistry`	(optional) Assay configuration. NOTE: by default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection. Select one of: `auto` for auto-detection (default), `threeprime` for Single Cell 3′, `fiveprime` for Single Cell 5′, `SC3Pv2` for Single Cell 3′ v2, `SC3Pv3` for Single Cell 3′ v3, `SC3Pv3LT` for Single Cell 3′ v3 LT, `SC3Pv3HT` for Single Cell 3′ v3 HT, `SC5P-PE` for Single Cell 5′ paired-end (both R1 and R2 are used for alignment), `SC5P-R2` for Single Cell 5′ R2-only (where only R2 is used for alignment). `SC3Pv1` for Single Cell 3′ v1. NOTE: this mode cannot be auto-detected. It must be set explicitly with this option.
`--r1-length`	(optional) Hard-trim the input R1 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3′ v2 or Single Cell 5′. This and `--r2-length` are useful for determining the optimal read length for sequencing.
`--r2-length`	(optional) Hard-trim the input R2 sequence to this length.
`--lanes`	(optional) Lanes associated with this sample
`--localcores`	Restricts cellranger to use specified number of cores to execute pipeline stages. By default, cellranger will use all of the cores available on your system.
`--localmem`	Restricts cellranger to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger will use 90% of the memory available on your system.

对于One Sample, One GEM Well, One Flowcell 只需要以上两步。

3. cellranger aggr

In this case, demultiplex the data from the sequencing run with cellranger mkfastq, then run the libraries from each GEM well through a separate instance of cellranger count. Then you can perform a combined analysis using cellranger aggr.

cellranger aggr --id=AGG123 \
                  --csv=AGG123_libraries.csv \
                  --normalize=mapped

详细参数：

Argument	Description
`--id=ID`	A unique run ID string: e.g. `AGG123`
`--csv=CSV`	Path of a CSV file containing a list of cellranger count outputs (see Setting up a CSV).
`--normalize=MODE`	(Optional) String specifying how to normalize depth across the input libraries. Valid values: `mapped` (default), or `none` (see Depth Normalization).
`--nosecondary`	(Optional) Add this flag to skip secondary analysis which includes dimensionality reduction, clustering and visualization. This is applicable if you plan to use cellranger reanalyze or your own custom analysis.

4. cellranger multi

cellranger multi is used to analyze Cell Multiplexing data. It inputs FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger multi pipeline also supports the analysis of Feature Barcode data.