生信转录组学习笔记

最新推荐文章于 2023-02-23 09:12:03 发布

冬、不眠的夜

最新推荐文章于 2023-02-23 09:12:03 发布

阅读量426

点赞数

文章标签：生物信息学

本文链接：https://blog.csdn.net/zsx2541577860/article/details/117672868

版权

2021-6-2
1、使用aspera  从公共数据库下载测序数据  fastq文件
    https://zhuanlan.zhihu.com/p/91675934
    -i 密钥绝对路径
2、使用fastqc对fastq（或SAM、BAM）数据进行质量评估

3、安装conda     
   给conda 添加channel
4、ensemble 数据库

2021-6-3
1、华为云  公网IP：114.115.136.123
2、lsb_release -a   查看linux系统类型
3、高通量测序（下一代测序技术/第二代测序）   PacBio（第三代测序）
4、https://www.ncbi.nlm.nih.gov/sra/?term=GSM

2021-6-4
1、将用户加入 sudoers  
https://blog.csdn.net/u014686180/article/details/44701961
2、centos下增加环境变量  https://www.cnblogs.com/longyejiadao/archive/2012/06/28/2567885.html
3、下载NCBI数据方式：
   https://mp.weixin.qq.com/s?src=11&timestamp=1622795722&ver=3109&signature=JXGFR4GgEWAajuMq52fuUo*Es4kTI*zzJg08YYeem5bog6ikNE6JQkpfP2dWdgtuTwehUbTUIY2udGNChGenuouXj50Sa*E4AEstP8aa*LTozfuxnEaXG7TpTLC0V*gE&new=1
4、 知云

5、安装 sratoolkit（通过conda下载安装或自己下载安装包下载）
   prefetch
6、让进程后台运行  nohup

2021-6-5
1、fastqc report 报告查询：
  https://www.jianshu.com/p/c81c7110eed4?from=singlemessage

2、Denove测序最重要的目的就是对这些短的Reads进行组装、拼接，最终绘制出这个物种的基因组图谱。

3、质控、评估、修正过滤、组装、转录

4、ls -lh   查看当前目录下存储情况

5、下载 bwa
    conda create -n rna python=2 bwa
    #conda create -n env_name python=2.7

6、一个RNA-seq实战：
   http://www.bio-info-trainee.com/2218.html

7、top   awk   history(查看命令行记录)  

8、批量下载数据：
   cat >id
   SRR1039508
   SRR1039509
   SRR1039510
   ....
   cat id |while read id;do (prefetch $id &);done

   top  查看
   
   ps -ef |grep prefetch   查看脚本

   ps -ef |grep prefetch|awk '{print $2}'|while read id;do kill $id;done   批量杀死进程

   fastq-dump --gzip --split-3 -o ./ SRR1039508.sra
   

2021-6-6
1、  export PATH=/bin:/usr/bin:$PATH   

     top 运行中可以通过 top 的内部命令对进程的显示方式进行控制

2、 fastq-dump --split-spot --gzip SRR1553610.sra

3、批量qc  ls *fastq |xargs fastqc -t 10
           multiqc ./

4、conda 创建环境
   conda create -n rna python=2 
   source activate rna    #激活环境
   查看conda创建的环境个数
   conda info -e
   进入conda基础环境
   conda activate
   退出 conda 虚拟环境
   conda deactivate


   添加conda channel
   conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
   conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
   conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
   conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
   conda config --set show_channel_urls yes

5、cat SRR1553610.fastq |grep ACACGT
   zcat SRR1553610.fastq.gz |grep ACACGT
   ##filter the bad quality reads and remove adaptors   过滤和修剪
   
   单个
   bin_trim_galore=trim_galore
   dir='clean'       #在clean的上级目录运行
   fq1='/home/train/ncbi/public/sra/SRR1039516.fastq'
   fq2='/home/train/ncbi/public/sra/SRR1553610.fastq'
   $bin_trim_galore -q 25 --phred33 --length 36 -e 0.1 --stringency 3 --paired -o $dir $fq1 $fq2


   批量循环修剪过滤：
   ls *_1.fastq.gz > 1
   ls *_2.fastq.gz > 2
   paste 1 2 > config       #创建批处理config文件


   bin_trim_galore=trim_galore
   dir='/home/train/project/airway/clean1
'       #结果输出位置
   cat $config_file |while read id
   do
            arr=($id)
            fq1=${arr[0]}
            fq2=${arr[1]}
   nohup $bin_trim_galore -q 25 --phred33 --length 36 -e 0.1 --stringency 3 --paired -o $dir $fq1 $fq2 &
   done

   例1：qc.sh
   source activate rna
   bin_trim_galore=trim_galore
   dir='/home/train/project/airway/clean1'       #结果输出位置
   cat /home/train/ncbi/public/sra/config |while read id
   do
            arr=($id)
            fq1=${arr[0]}
            fq2=${arr[1]}
   nohup $bin_trim_galore -q 25 --phred33 --length 36 -e 0.1 --stringency 3 --paired -o $dir $fq1 $fq2 &
   done
   conda deactivate

   例2：
   source activate rna
   bin_trim_galore=trim_galore
   dir='/home/train/project/airway/clean1'       #结果输出位置
   cat $1 |while read id
   do
            arr=($id)
            fq1=${arr[0]}
            fq2=${arr[1]}
   nohup $bin_trim_galore -q 25 --phred33 --length 36 -e 0.1 --stringency 3 --paired -o $dir $fq1 $fq2 &
   done
   conda deactivate
   

   运行 bash qc.sh config

2021-6-7
1、5个比对软件
   hisat2,subjunc,star,bwa,bowtie2
2、构建索引 
   ~/biosoft/STAR/STRA-2.5.3a/bin/Linux_x86_64/STAR --runMode genomeGenerate \
   --genomeDir ~/reference/index/star/hg19 \
   --genomeFastaFiles ~/reference/genome/hg19/hg19.fa \
   --sjdbGTFfile ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf \
   --sjdbOverhang 149 --runThreadN 4

3、学习awk ：https://www.runoob.com/linux/linux-comm-awk.html
   dirname    basename   命令的使用