https://zhuanlan.zhihu.com/p/20702684 关于illumina 测序的原理
so, 我的测序结果是paired-end sequencing 并且是在不同的lanes上测序的。原始名称为:
larvae1_S1_L001_R1_001.fastq.gz ,
larvae1_S1_L001_R2_001.fastq.gz ,
larvae1_S1_L002_R1_001.fastq.gz ,
larvae1_S1_L002_R2_001.fastq.gz ,
larvae1_S1_L003_R1_001.fastq.gz ,
larvae1_S1_L003_R2_001.fastq.gz ,
larvae1_S1_L004_R1_001.fastq.gz ,
larvae1_S1_L004_R2_001.fastq.gz ,
#首先我需要把不同lane的sequencing 串联起来:
cat larvae1_S1_L001_R1_001.fastq.gz larvae1_S1_L002_R1_001.fastq.gz larvae1_S1_L003_R1_001.fastq.gz larvae1_S1_L004_R1_001.fastq.gz > larvae1_R1.fastq.gz
cat larvae1_S1_L001_R2_001.fastq.gz larvae1_S1_L002_R2_001.fastq.gz larvae1_S1_L003_R2_001.fastq.gz larvae1_S1_L004_R2_001.fastq.gz > larvae1_R2.fastq.gz
#接下来是质控: FastQC 继续在linux执行
module load fastqc
fastqc larvae1_R1.fastq.gz
fastqc larvae1_R2.fastq.gz
#接下来是去除adapter
有两种方法, cutadapt 是用于知道adapter序列的情况下; NGmerge 用于不知adapter序列的情况下, 并且只能用于paried-end测序的情况下。NGmerge 好像依赖于GCC,zlib OpenMP安装这些都是很头疼,我不知道怎么弄,直接用conda install zlib/GCC/OpenMP/NGmerge这样安装了。
module load NGmerge
NGmerge -1 larvae1_R1.fastq.gz -2 larvae1_R2.fastq.gz -o larvae1 -a
//
或者 NGmerge -a -1 larvae1_R1.fastq.gz -2 larvae1_R2.fastq.gz -o larvae1 -v
输出的文件为: larvae1_1.fastq.gz 和 larvae1_2.fastq.gz
接下来是和基因组匹配:
这里有一个比较重要的是,基因组序列必须建立索引。
module load bowtie2
bowtie2-build Amphimedon_queenslandica.Aqu1.dna.toplevel.fa Amphimedon_queenslandica.Aqu1.dna.toplevel
方法一:
bowtie2 -p 15 -x /opsin/u/huifang/ATAC-Seq/larvae1/ATAC-seq1017/Amphimedon_queenslandica.Aqu1.dna.toplevel -1 larvae1_1.fastq.gz -2 larvae1_2.fastq.gz -S larvae1.sam
得到了一个很大的sam文件。
比对之后输出为sam文件,该文件包括了匹配信息。 然后Sam文件应该被压缩成bam文件,通过软件samtools 来实现。
grep -v "XS