对fastq文件改名
cellranger识别的文件名为
SRR****_S1_L001_I1/R1/R2_001.fastq.gz
从数据库下载的文件名为
SRR11903614_1.fastq.gz
SRR11903614_2.fastq.gz
_1为R1,_2为R2,批量重命名
cat srrlist.txt | while read i ;do (mv ${i}_1*.gz ${i}_S1_L001_R1_001.fastq.gz;mv ${i}_2*.gz ${i}_S1_L001_R2_001.fastq.gz);done
因为是一个样本有4个RUN,所以应该对每个RUN单独count之后,再使用aggr进行合并
先对文件夹目录进行更改,将一个样本的R1,R2放在一个单独文件夹中,并重命名文件夹
因为样本数太多,所以要使用循环语句批量处理
# 使用for循环来进行多个样本count,因为每个都需要很长时间,第一个开始成功运行后就可以不用看了,一两个小时看一眼即可,适合放在过夜运行,记得保证电脑不休眠
# 先定义两个变量以便循环使用,参考基因组路径和fastq文件路径
genomedir=/mnt/yard/ref/three/h38_ebv_gfp
datadir=/mnt/sra
sample='14 15 16 17 22 23 24 25 32 33 34 35'
date
for s in $sample
>do
>date
>cellranger count --id=${s}_count_out \ # 最后输出的文件夹命名
--fastqs=$datadir/$s \
--sample=$s \
--transcriptome=$genomedir \ # 参考基因组路径,到之前生成的自定义的文件夹,STAR的上一级
--nosecondary \ # 因为还需要下一步聚合四个RUN,所以不进行单独的降维聚类
--localcores=16 \ # 使用的线程数,往最大了写
--localmem=128 # 限制使用的内存,往大了写
>date
>wait
>done
# 如果循环报错可以去研究一下shell的变量的命名、用法
基本每个样本需要30-60mins完成
2021-11-25 20:56:08 [jobmngr] WARNING: configured to use 128GB of local memory, but only 124.3GB is currently available.Serving UI at http://ecs-3615:34763?auth=lCuixrAKarDVtrFxm2F2GkiNPJ1ViorXdh-YHhh7XjE
Running preflight checks (please wait)...
Checking sample info...
Checking FASTQ folder...
Checking reference...
Checking reference_path (/mnt/yard/ref/three/h38_ebv_gfp) on ecs-3615...
Checking optional arguments...
mrc: v4.0.6
mrp: v4.0.6
Anaconda: Python 3.8.2
numpy: 1.19.2
scipy: 1.6.2
pysam: 0.16.0.1
h5py: 3.2.1
pandas: 1.2.4
STAR: 2.7.2a
samtools: samtools 1.10
Using htslib 1.10.2
Copyright (C) 2019 Genome Research Ltd.
2021-11-25 20:56:10 [runtime] (ready) ID.22_count_out.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY
...
...
2021-11-26 03:10:29 [runtime] (chunks_complete) ID.25_count_out.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CHOOSE_CLOUPE
Outputs:
- Run summary HTML: /mnt/sra/25_count_out/outs/web_summary.html
- Run summary CSV: /mnt/sra/25_count_out/outs/metrics_summary.csv
- BAM: /mnt/sra/25_count_out/outs/possorted_genome_bam.bam
- BAM index: /mnt/sra/25_count_out/outs/possorted_genome_bam.bam.bai
- Filtered feature-barcode matrices MEX: /mnt/sra/25_count_out/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5: /mnt/sra/25_count_out/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX: /mnt/sra/25_count_out/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /mnt/sra/25_count_out/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV: null
- Per-molecule read information: /mnt/sra/25_count_out/outs/molecule_info.h5
- CRISPR-specific analysis: null
- CSP-specific analysis: null
- Loupe Browser file: null
- Feature Reference: null
- Target Panel File: null
- Probe Set File: null
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
每个样本都生成了out文件夹,里面的h5文件,可以用于下一步aggr的合并