count

最新推荐文章于 2024-07-04 21:02:38 发布

璐璐璐璐璐952

最新推荐文章于 2024-07-04 21:02:38 发布

阅读量678

点赞数

文章标签：其他经验分享

本文链接：https://blog.csdn.net/weixin_63884580/article/details/121544474

版权

对fastq文件改名

cellranger识别的文件名为

SRR****_S1_L001_I1/R1/R2_001.fastq.gz

从数据库下载的文件名为

SRR11903614_1.fastq.gz

SRR11903614_2.fastq.gz

_1为R1，_2为R2，批量重命名

cat srrlist.txt | while read i ;do (mv ${i}_1*.gz ${i}_S1_L001_R1_001.fastq.gz;mv ${i}_2*.gz ${i}_S1_L001_R2_001.fastq.gz);done

因为是一个样本有4个RUN，所以应该对每个RUN单独count之后，再使用aggr进行合并

先对文件夹目录进行更改，将一个样本的R1，R2放在一个单独文件夹中，并重命名文件夹

因为样本数太多，所以要使用循环语句批量处理

# 使用for循环来进行多个样本count，因为每个都需要很长时间，第一个开始成功运行后就可以不用看了，一两个小时看一眼即可，适合放在过夜运行，记得保证电脑不休眠

# 先定义两个变量以便循环使用，参考基因组路径和fastq文件路径
genomedir=/mnt/yard/ref/three/h38_ebv_gfp
datadir=/mnt/sra

sample='14 15 16 17 22 23 24 25 32 33 34 35'
date
for s in $sample
>do
>date
>cellranger count --id=${s}_count_out \    # 最后输出的文件夹命名
--fastqs=$datadir/$s \
--sample=$s \
--transcriptome=$genomedir \    # 参考基因组路径，到之前生成的自定义的文件夹，STAR的上一级
--nosecondary \    # 因为还需要下一步聚合四个RUN，所以不进行单独的降维聚类
--localcores=16 \    # 使用的线程数，往最大了写
--localmem=128    # 限制使用的内存，往大了写
>date
>wait
>done
# 如果循环报错可以去研究一下shell的变量的命名、用法

基本每个样本需要30-60mins完成

2021-11-25 20:56:08 [jobmngr] WARNING: configured to use 128GB of local memory, but only 124.3GB is currently available.Serving UI at http://ecs-3615:34763?auth=lCuixrAKarDVtrFxm2F2GkiNPJ1ViorXdh-YHhh7XjE

Running preflight checks (please wait)...
Checking sample info...
Checking FASTQ folder...
Checking reference...
Checking reference_path (/mnt/yard/ref/three/h38_ebv_gfp) on ecs-3615...
Checking optional arguments...
mrc: v4.0.6

mrp: v4.0.6

Anaconda: Python 3.8.2

numpy: 1.19.2

scipy: 1.6.2

pysam: 0.16.0.1

h5py: 3.2.1

pandas: 1.2.4

STAR: 2.7.2a

samtools: samtools 1.10
Using htslib 1.10.2
Copyright (C) 2019 Genome Research Ltd.

2021-11-25 20:56:10 [runtime] (ready)           ID.22_count_out.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY
...
...
2021-11-26 03:10:29 [runtime] (chunks_complete) ID.25_count_out.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CHOOSE_CLOUPE

Outputs:
- Run summary HTML:                         /mnt/sra/25_count_out/outs/web_summary.html
- Run summary CSV:                          /mnt/sra/25_count_out/outs/metrics_summary.csv
- BAM:                                      /mnt/sra/25_count_out/outs/possorted_genome_bam.bam
- BAM index:                                /mnt/sra/25_count_out/outs/possorted_genome_bam.bam.bai
- Filtered feature-barcode matrices MEX:    /mnt/sra/25_count_out/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5:   /mnt/sra/25_count_out/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX:  /mnt/sra/25_count_out/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /mnt/sra/25_count_out/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV:            null
- Per-molecule read information:            /mnt/sra/25_count_out/outs/molecule_info.h5
- CRISPR-specific analysis:                 null
- CSP-specific analysis:                    null
- Loupe Browser file:                       null
- Feature Reference:                        null
- Target Panel File:                        null
- Probe Set File:                           null

Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!

每个样本都生成了out文件夹，里面的h5文件，可以用于下一步aggr的合并