xindi数据分析记录

1、 使用FastQC软件对数据进行质控检测

fastqc -t 16 -o ${dir}/fastqc_report/ ${dir}/clean_data/*.fq.gz

2、 使用Trim Galore软件对三组数据进行质控,去掉20bp以下的reads

1.对HeLa细胞数据进行处理

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2.fq.gz

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HeLa_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HeLa_Clean_Data2.fq.gz

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_HeLa_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_HeLa_Clean_Data2.fq.gz

2.对HCT116细胞数据进行处理

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq.gz

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HCT116_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HCT116_Clean_Data2.fq.gz

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_HCT116_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_HCT116_Clean_Data2.fq.gz

3.对293T细胞数据进行处理

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq.gz

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_293T_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_293T_Clean_Data2.fq.gz

trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_293T_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_293T_Clean_Data2.fq.gz

3. 使用gffread-0.12.1软件将45S 和5S rRNA的GFF3注释文件转换为GTF格式

参考文章:gffcompare和gffread

Usage: gffread <input_gff> [-g <genomic_seqs_fasta> | <dir>][-s <seq_info.fsize>]
 [-o <outfile>] [-t <trackname>] [-r [[<strand>]<chr>:]<start>..<end> [-R]]
 [-CTVNJMKQAFPGUBHZWTOLE] [-w <exons.fa>] [-x <cds.fa>] [-y <tr_cds.fa>]
 [-i <maxintron>] [--stream] [--bed] [--table <attrlist>] [--sort-by <ref.lst>]

(base) lizexing@bio:~/reference/h_45S_rDNA$ gffread U13369.1.gff3 -T -o U13369.1.gtf
(base) lizexing@bio:~/reference/h_5S_rDNA$ gffread NR_023363.1.gff3 -T -o NR_023363.1.gtf

4. 使用STAR软件对45S 和5S rRNA构建索引、对GRCh38.dna.primary_assembly、GRCh38.ncRNA、GRCh38.cds.all构建索引

参考文章比对软件STAR的使用

# 参数说明
--runThreadN是指你要用几个cpu来运行;
--genomeDir构建索引输出文件的目录;
--genomeFastaFiles你的基因组fasta文件所在的目录
--limitGenomeGenerateRAM 43749387189 STAR消耗内存太大,输入限制内存数目防止出错,感谢孙小雨帮忙

(base) lizexing@bio:~$ STAR  --runMode genomeGenerate --runThreadN 16 --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --genomeFastaFiles /Data/lizexing/reference/h_45S_rDNA/U13369.1.fasta
Sep 05 14:14:23 ..... started STAR run
Sep 05 14:14:23 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=42999, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 6
Sep 05 14:14:23 ... starting to sort Suffix Array. This may take a long time...
Sep 05 14:14:23 ... sorting Suffix Array chunks and saving them to disk...
Sep 05 14:14:23 ... loading chunks from disk, packing SA...
Sep 05 14:14:23 ... finished generating suffix array
Sep 05 14:14:23 ... generating Suffix Array index
Sep 05 14:14:26 ... completed Suffix Array index
Sep 05 14:14:26 ... writing Genome to disk ...
Sep 05 14:14:26 ... writing Suffix Array to disk ...
Sep 05 14:14:26 ... writing SAindex to disk
Sep 05 14:14:28 ..... finished successfully

(base) lizexing@bio:~$ STAR  --runMode genomeGenerate --runThreadN 16 --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --genomeFastaFiles /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.fasta
Dec 15 19:47:24 ..... started STAR run
Dec 15 19:47:24 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=121, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 2
Dec 15 19:47:24 ... starting to sort Suffix Array. This may take a long time...
Dec 15 19:47:24 ... sorting Suffix Array chunks and saving them to disk...
Dec 15 19:47:24 ... loading chunks from disk, packing SA...
Dec 15 19:47:24 ... finished generating suffix array
Dec 15 19:47:24 ... generating Suffix Array index
Dec 15 19:47:27 ... completed Suffix Array index
Dec 15 19:47:27 ... writing Genome to disk ...
Dec 15 19:47:27 ... writing Suffix Array to disk ...
Dec 15 19:47:27 ... writing SAindex to disk
Dec 15 19:47:31 ..... finished successfully

(base) lizexing@bio:~/reference/Ensembl_GRCh38$ STAR  --runMode genomeGenerate --runThreadN 40 --limitGenomeGenerateRAM 82424365322 --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_dna_primary_assembly_index --genomeFastaFiles /Data/lizexing/reference/Ensembl_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Mar 06 14:29:42 ..... started STAR run
Mar 06 14:29:42 ... starting to generate Genome files
Mar 06 14:30:58 ... starting to sort Suffix Array. This may take a long time...
Mar 06 14:31:18 ... sorting Suffix Array chunks and saving them to disk...
Mar 06 14:44:13 ... loading chunks from disk, packing SA...
Mar 06 14:45:46 ... finished generating suffix array
Mar 06 14:45:46 ... generating Suffix Array index
Mar 06 14:49:53 ... completed Suffix Array index
Mar 06 14:49:53 ... writing Genome to disk ...
Mar 06 14:49:55 ... writing Suffix Array to disk ...
Mar 06 14:50:18 ... writing SAindex to disk
Mar 06 14:50:20 ..... finished successfully

(base) lizexing@bio:~/reference/Ensembl_GRCh38$ STAR  --runMode genomeGenerate --runThreadN 16 --limitGenomeGenerateRAM 82424365322 --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index --genomeFastaFiles /Data/lizexing/reference/Ensembl_GRCh38/Homo_sapiens.GRCh38.cds.all.fa
Mar 05 10:59:02 ..... started STAR run
Mar 05 10:59:03 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=137654284, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 12
Mar 05 11:00:53 ... starting to sort Suffix Array. This may take a long time...
Mar 05 11:02:49 ... sorting Suffix Array chunks and saving them to disk...
Mar 05 11:04:45 ... loading chunks from disk, packing SA...
Mar 05 11:05:50 ... finished generating suffix array
Mar 05 11:05:50 ... generating Suffix Array index
Mar 05 11:06:41 ... completed Suffix Array index
Mar 05 11:06:41 ... writing Genome to disk ...
Mar 05 11:07:17 ... writing Suffix Array to disk ...
Mar 05 11:07:18 ... writing SAindex to disk
Mar 05 11:07:19 ..... finished successfully

(base) lizexing@bio:~/reference/Ensembl_GRCh38$ STAR  --runMode genomeGenerate --runThreadN 16 --limitGenomeGenerateRAM 82424365322 --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index --genomeFastaFiles /Data/lizexing/reference/Ensembl_GRCh38/Homo_sapiens.GRCh38.ncrna.fa

5. STAR比对用法和结果说明

Usage: STAR  [options]... --genomeDir /path/to/genome/index/   --readFilesIn R1.fq R2.fq
--runThreadN 40 \ #线程数
--runMode alignReads \ #比对模式
--readFilesCommand zcat \ #说明你的fastq文件是压缩形式的,就是.gz结尾的,不加的话会报错
--quantMode TranscriptomeSAM GeneCounts \ #将reads比对至转录本序列
--sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf #加入对应的注释文件
--twopassMode Basic \ #先按索引进行第一次比对,而后把第一次比对发现的新剪切位点信息加入到索引中进行第二次比对。这个参数可以保证更精准的比对情况,但是费时也费内存。
--outSAMtype BAM Unsorted \ #输出BAM文件,不进行排序。如果不加这一行,只输出SAM文件。
--outSAMunmapped None \
--genomeDir /gpfs/home/fangy04/downloads/STAR_index/GRCh38/ \ #索引文件目录
--readFilesIn /gpfs/home/fangy04/downloads/SRR8112732_1.fastq.gz /gpfs/home/fangy04/downloads/SRR8112732_2.fastq.gz \ #两个fastq文件目录
--outFileNamePrefix DRB_TT_seq_SRR8112732 #输出文件前缀
--outReadsUnmapped # output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). Fastx   ... output in separate fasta/fastq files, Unmapped.out.mate1/2
--outSAMunmapped # output of unmapped reads in the SAM format
9216920116 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.out.bam #这个文件是最重要的,用来后续进行remove duplicates和sort
1166235552 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.toTranscriptome.out.bam #这个文件是那些比对到转录本上的reads组成的bam文件
2034 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.final.out
20188 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.out
2571 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.progress.out
1585521 Jun 28 17:06 DRB_TT_seq_SRR8112732ReadsPerGene.out.tab
6732305 Jun 28 17:06 DRB_TT_seq_SRR8112732SJ.out.tab #剪切的信息
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARgenome
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARpass1

6. 使用STAR软件对三组数据与45S rRNA进行比对

1、对HeLa测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HeLa-val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HeLa_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HeLa_Clean_Data2_val_2.fq.gz --outFileNamePrefix GFP_HeLa_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_HeLa_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_HeLa_Clean_Data2_val_2.fq.gz --outFileNamePrefix Input_HeLa_val --outReadsUnmapped Fastx

2、对HCT116测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HCT116-val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HCT116_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_HCT116_Clean_Data2_val_2.fq.gz --outFileNamePrefix GFP_HCT116_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/CleanData/Input_HCT116_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/CleanData/Input_HCT116_Clean_Data2_val_2.fq.gz --outFileNamePrefix Input_HCT116_val --outReadsUnmapped Fastx

3、对293T测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix 293T-val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_293T_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/GFP_293T_Clean_Data2_val_2.fq.gz --outFileNamePrefix GFP_293T_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_293T_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/2021_11_16/CleanData/Input_293T_Clean_Data2_val_2.fq.gz --outFileNamePrefix Input_293T_val --outReadsUnmapped Fastx

8. 使用STAR软件对三组数据未比对上的序列与GRCh38.ncrna比对

1、对HeLa测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HeLa/45SRNA/HeLa-valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HeLa/45SRNA/HeLa-valUnmapped.out.mate2 --outFileNamePrefix HeLa_ncrna_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HeLa/45SRNA/GFP_HeLa_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HeLa/45SRNA/GFP_HeLa_valUnmapped.out.mate2 --outFileNamePrefix HeLa_ncrna_val --outReadsUnmapped Fastx

2、对HCT116测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HCT116/45SRNA/HCT116-valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HCT116/45SRNA/HCT116-valUnmapped.out.mate2 --outFileNamePrefix HCT116_ncrna_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HCT116/45SRNA/GFP_HCT116_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HCT116/45SRNA/GFP_HCT116_valUnmapped.out.mate2 --outFileNamePrefix HCT116_ncrna_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/HCT116/45SRNA/Input_HCT116_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/HCT116/45SRNA/Input_HCT116_valUnmapped.out.mate2 --outFileNamePrefix HCT116_ncrna_val --outReadsUnmapped Fastx

3、对293T测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/293T/45SRNA/293T-valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/293T/45SRNA/293T-valUnmapped.out.mate2 --outFileNamePrefix 293T_ncrna_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/293T/45SRNA/GFP_293T_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/293T/45SRNA/GFP_293T_valUnmapped.out.mate2 --outFileNamePrefix 293T_ncrna_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_ncrna_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/293T/45SRNA/Input_293T_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/293T/45SRNA/Input_293T_valUnmapped.out.mate2 --outFileNamePrefix 293T_ncrna_val --outReadsUnmapped Fastx

9. 使用STAR软件对三组数据未比对上的序列与GRCh38.cds.all比对

1、对HeLa测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HeLa/45SRNA/HeLa-valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HeLa/45SRNA/HeLa-valUnmapped.out.mate2 --outFileNamePrefix HeLa_cds_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HeLa/45SRNA/GFP_HeLa_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HeLa/45SRNA/GFP_HeLa_valUnmapped.out.mate2 --outFileNamePrefix HeLa_cds_val --outReadsUnmapped Fastx

2、对HCT116测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HCT116/45SRNA/HCT116-valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/HCT116/45SRNA/HCT116-valUnmapped.out.mate2 --outFileNamePrefix HCT116_cds_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HCT116/45SRNA/GFP_HCT116_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/HCT116/45SRNA/GFP_HCT116_valUnmapped.out.mate2 --outFileNamePrefix HCT116_cds_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/HCT116/45SRNA/Input_HCT116_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/HCT116/45SRNA/Input_HCT116_valUnmapped.out.mate2 --outFileNamePrefix HCT116_cds_val --outReadsUnmapped Fastx

3、对293T测序数据进行比对

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/293T/45SRNA/293T-valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/TopBP/293T/45SRNA/293T-valUnmapped.out.mate2 --outFileNamePrefix 293T_cds_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/293T/45SRNA/GFP_293T_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/GFP/293T/45SRNA/GFP_293T_valUnmapped.out.mate2 --outFileNamePrefix 293T_cds_val --outReadsUnmapped Fastx

STAR --runThreadN 40 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/Ensembl_GRCh38/star_cds_all_index/ --readFilesIn /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/293T/45SRNA/Input_293T_valUnmapped.out.mate1 /Data/lizexing/projects/xindi/2022_03_05/TreatData/Input/293T/45SRNA/Input_293T_valUnmapped.out.mate2 --outFileNamePrefix 293T_cds_val --outReadsUnmapped Fastx

10. 使用FastQC软件对数据进行质控检测 (2022-05-15/2022-05-16)

fastqc -t 16 -o ${dir}/fastqc_report/ ${dir}/clean_data/*.fq.gz

11、 使用Trim Galore软件对两次数据进行质控,去掉20bp以下的reads

对2022_05_15细胞数据进行处理
vim新建RNA_seq_script将数据质控、比对、格式转换、排序、拼接和定量综合在一起。

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/18       zexing            First release
# 设置变量${dir}为常用目录
dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_15

# 对数据进行质控
# fastqc -t 16 -o ${dir}/fastqc_report/ ${dir}/raw_data/*.fq.gz

# 利用for循环进行后续操作
for i in G1 G2 G3 T4 T5 T6
do
# 对数据利用Trim_galore去掉20bp以下的接头
trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 8 --paired \
${dir}/"$i"_Clean_Data1.fq.gz \
${dir}/"$i"_Clean_Data2.fq.gz
done

后台运行RNA_seq_script:

nohup bash RNA_seq_script > RNA_seq_script_log &

对2022_05_16细胞数据进行处理
vim新建RNA_seq_script将数据质控、比对、格式转换、排序、拼接和定量综合在一起。

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/18       zexing            First release
# 设置变量${dir}为常用目录
dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 对数据进行质控
# fastqc -t 16 -o ${dir}/fastqc_report/ ${dir}/raw_data/*.fq.gz

# 利用for循环进行后续操作
for i in C_a C_b C_c T_d T_e T_f
do
# 对数据利用Trim_galore去掉20bp以下的接头
trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 8 --paired \
${dir}/"$i"_Clean_Data1.fq.gz \
${dir}/"$i"_Clean_Data2.fq.gz
done

后台运行RNA_seq_script:

nohup bash RNA_seq_script > RNA_seq_script_log &

12. 使用STAR软件对45S rRNA构建索引、对GRCh38.dna.primary_assembly、GRCh38.ncRNA、GRCh38.cds.all构建索引

# 参数说明
--runThreadN是指你要用几个cpu来运行;
--genomeDir构建索引输出文件的目录;
--genomeFastaFiles你的基因组fasta文件所在的目录
--limitGenomeGenerateRAM 43749387189 STAR消耗内存太大,输入限制内存数目防止出错,感谢孙小雨帮忙

STAR  --runMode genomeGenerate --runThreadN 16 --limitGenomeGenerateRAM 43749387189 --genomeDir /home/customer/lizexing/references/Human_45S/star_index --genomeFastaFiles /home/customer/lizexing/references/Human_45S/U13369.1.fasta

STAR  --runMode genomeGenerate --runThreadN 16 --genomeDir /home/customer/lizexing/references/Ensembl/Human  \
--genomeFastaFiles /home/customer/lizexing/references/Ensembl/Human/Homo_sapiens.GRCh38.dna.primary_assembly.fa

STAR  --runMode genomeGenerate --runThreadN 16 --limitGenomeGenerateRAM 43749387189 \
--genomeDir /home/customer/lizexing/references/Ensembl/Human/star_ncrna_index/  \
--genomeFastaFiles /home/customer/lizexing/references/Ensembl/Human/Homo_sapiens.GRCh38.ncrna.fa

STAR  --runMode genomeGenerate --runThreadN 8 --limitGenomeGenerateRAM 82424365322 \
--genomeDir /home/customer/lizexing/references/Ensembl/Human/star_cds_index/  \
--genomeFastaFiles /home/customer/lizexing/references/Ensembl/Human/Homo_sapiens.GRCh38.cds.all.fa

13. 使用STAR软件对两次数据与45S rRNA进行比对

vim新建RNA_seq_script_2对2022_05_15细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/19       zexing            First release
# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 利用for循环进行后续操作
for i in G1 G2 G3 T4 T5 T6
do
STAR --runThreadN 8 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic --outSAMtype BAM Unsorted \
--sjdbGTFfile /home/customer/lizexing/references/Human_45S/U13369.1.gtf \
--genomeDir /home/customer/lizexing/references/Human_45S/star_index/ \
--readFilesIn /home/customer/lizexing/projects/xindi/TreatData/2022_05_15/"$i"_Clean_Data1_val_1.fq.gz 
/home/customer/lizexing/projects/xindi/TreatData/2022_05_15/"$i"_Clean_Data2_val_2.fq.gz \
--outFileNamePrefix "$i"-val --outReadsUnmapped Fastx
done

后台运行RNA_seq_script_2:

nohup bash RNA_seq_script_2 > RNA_seq_script_2_log &

vim新建RNA_seq_script_2对2022_05_16细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/19       zexing            First release
# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 利用for循环进行后续操作
for i in C_a C_b C_c T_d T_e T_f
do
STAR --runThreadN 8 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --twopassMode Basic --outSAMtype BAM Unsorted \
--sjdbGTFfile /home/customer/lizexing/references/Human_45S/U13369.1.gtf \
--genomeDir /home/customer/lizexing/references/Human_45S/star_index/ \
--readFilesIn /home/customer/lizexing/projects/xindi/TreatData/2022_05_16/"$i"_Clean_Data1_val_1.fq.gz \
/home/customer/lizexing/projects/xindi/TreatData/2022_05_16/"$i"_Clean_Data2_val_2.fq.gz \
--outFileNamePrefix "$i"-val --outReadsUnmapped Fastx
done

后台运行RNA_seq_script_2:

nohup bash RNA_seq_script_2 > RNA_seq_script_2_log &

14. 使用STAR软件对两次数据未比对上的序列与与GRCh38.ncRNA进行比对

vim新建RNA_seq_script_3 对2022_05_15细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/19       zexing            First release
# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 利用for循环进行后续操作
for i in G1 G2 G3 T4 T5 T6
do
STAR --runThreadN 8 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted \
--genomeDir /home/customer/lizexing/references/Ensembl/Human/star_ncrna_index/ \
--readFilesIn /home/customer/lizexing/projects/xindi/TreatData/2022_05_15/"$i"-valUnmapped.out.mate1 /home/customer/lizexing/projects/xindi/TreatData/2022_05_15/"$i"-valUnmapped.out.mate2 \
--outFileNamePrefix "$i"_ncrna_val --outReadsUnmapped Fastx
done

后台运行RNA_seq_script_3:

nohup bash RNA_seq_script_3 > RNA_seq_script_3_log &

vim新建RNA_seq_script_3 对2022_05_16细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/19       zexing            First release
# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 利用for循环进行后续操作
for i in C_a C_b C_c T_d T_e T_f
do
STAR --runThreadN 8 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted \
--genomeDir /home/customer/lizexing/references/Ensembl/Human/star_ncrna_index/ \
--readFilesIn /home/customer/lizexing/projects/xindi/TreatData/2022_05_16/"$i"-valUnmapped.out.mate1 /home/customer/lizexing/projects/xindi/TreatData/2022_05_16/"$i"-valUnmapped.out.mate2 \
--outFileNamePrefix "$i"_ncrna_val --outReadsUnmapped Fastx
done

后台运行RNA_seq_script_3:

nohup bash RNA_seq_script_3 > RNA_seq_script_3_log &

15. 使用STAR软件对两次数据未比对上的序列与与GRCh38.cds.all进行比对

vim新建RNA_seq_script_4 对2022_05_15细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/19       zexing            First release
# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 利用for循环进行后续操作
for i in G1 G2 G3 T4 T5 T6
do
STAR --runThreadN 8 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted \
--genomeDir /home/customer/lizexing/references/Ensembl/Human/star_cds_index/ \
--readFilesIn /home/customer/lizexing/projects/xindi/TreatData/2022_05_15/"$i"-valUnmapped.out.mate1 /home/customer/lizexing/projects/xindi/TreatData/2022_05_15/"$i"-valUnmapped.out.mate2 \
--outFileNamePrefix "$i"_cds_val --outReadsUnmapped Fastx
done

后台运行RNA_seq_script_4:

nohup bash RNA_seq_script_4 > RNA_seq_script_4_log &

vim新建RNA_seq_script_4 对2022_05_16细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/19       zexing            First release
# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_16

# 利用for循环进行后续操作
for i in C_a C_b C_c T_d T_e T_f
do
STAR --runThreadN 8 --runMode alignReads --twopassMode Basic --outSAMtype BAM Unsorted \
--genomeDir /home/customer/lizexing/references/Ensembl/Human/star_cds_index/ \
--readFilesIn /home/customer/lizexing/projects/xindi/TreatData/2022_05_16/"$i"-valUnmapped.out.mate1 /home/customer/lizexing/projects/xindi/TreatData/2022_05_16/"$i"-valUnmapped.out.mate2 \
--outFileNamePrefix "$i"_cds_val --outReadsUnmapped Fastx
done

后台运行RNA_seq_script_4:

nohup bash RNA_seq_script_4 > RNA_seq_script_4_log &

16. 使用Samtools软件对三组数据进行排序

vim新建RNA_seq_script_5 对2022_05_15细胞数据进行处理

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2022/05/31       zexing            First release
# 设置变量${dir}为常用目录
dir=/home/customer/lizexing/projects/xindi/TreatData/2022_05_15

# 利用for循环进行后续操作
for i in G1 G2 G3 T4 T5 T6
do
samtools sort -@ 8 -l 5 -o ${dir}/${i}-valAligned.out.bam.sort ${dir}/${i}-valAligned.out.bam
samtools sort -@ 8 -l 5 -o ${dir}/${i}_ncrna_valAligned.out.bam.sort ${dir}/${i}_ncrna_valAligned.out.bam
samtools sort -@ 8 -l 5 -o ${dir}/${i}_cds_valAligned.out.bam.sort ${dir}/${i}_cds_valAligned.out.bam
done

后台运行RNA_seq_script_5:

nohup bash RNA_seq_script_5 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值