Chip-seq数据寻找Indel

最新推荐文章于 2022-08-17 10:32:15 发布

wangyunpeng_bio

最新推荐文章于 2022-08-17 10:32:15 发布

阅读量1.9k

点赞数 3

分类专栏：分析流程文章标签：分析流程

本文链接：https://blog.csdn.net/qq_29300341/article/details/76165286

版权

分析流程专栏收录该内容

16 篇文章 25 订阅

订阅专栏

背景介绍：

癌症病人基因组内有许许多多的Indel，这些Indel与发病机制有着密切关系，而通常的做法是对癌症病人进行基因组测序或者外显子测序，找出相关Indel。

而Chip-seq数据可以找出Enhancer，对Chip-seq数据进行处理，可以将Indel与Enhancer进行结合分析

方法介绍：

简单地说，就是follow最近的高分文章《 Small genomic insertions form enhancers that misregulate oncogenes》PMID：28181482

用大牛的方法，跑自己的数据，最后如果搜集足够多的癌症病人Chip-seq数据，能找出更多新的enhancer区域内的插入缺失（Indel）

大牛的脚本在：

https://bitbucket.org/young_computation/indelsfromchipseq

Chip-Seq display

先做Chip-seq display一段

ChIP-Seq display.Reads were aligned to the hg19 revision of the human reference

genome using bowtie with parameters –best –k 2 –m 2 –sam and –l set to read

length. Read pileup in 50 bp bins was determined using MACS with parameters –

w –S –space=50 –shiftsize=200 –nomodel. WIG file output from MACS was

visualized in the UCSC genome browser.

首先，下载数据

   LY4_H3K27AC Chromatin immunoprecipitation against H3K27Ac 
 
   ascp -QT -l 100M -i ~/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR155/005/SRR1554605/SRR1554605.fastq.gz . 
 
   LY4_WCE Whole cell extract input control for ChIP 
 
   ascp -QT -l 100M -i ~/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR155/006/SRR1554606/SRR1554606.fastq.gz . 
 
   SUDHL6_H3K27AC     Chromatin immunoprecipitation against H3K27Ac 
 
   ascp -QT -l 100M -i ~/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR155/009/SRR1554599/SRR1554599.fastq.gz . 
 
   SUDHL6_WCE     Whole cell extract input control for ChIP 
 
   ascp -QT -l 100M -i ~/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR155/000/SRR1554600/SRR1554600.fastq.gz .

接着，fastqc质量检测

   fastqc -o . -f fastq -t 4 SRR1554605.fastq & 
 
   fastqc -o . -f fastq -t 4 SRR1554606.fastq & 
 
   fastqc -o . -f fastq -t 4 SRR1554599.fastq & 
 
   fastqc -o . -f fastq -t 4 SRR1554600.fastq &

下载bowtie hg19索引

   基因组索引从bowtie官网下载 
 
   Pre-built indexes H. sapiens, UCSC hg19    2.7 GB

bowtie序列比对

   bowtie genome/hg19 -q SRR1554605.fastq -k 2 --best -m 2 --threads 4 -l 40 -S 2> SRR1554605.out > SRR1554605.sam 
 
   bowtie genome/hg19 -q SRR1554606.fastq -k 2 --best -m 2 --threads 4 -l 40 -S 2> SRR1554606.out > SRR1554606.sam 
 
   bowtie genome/hg19 -q SRR1554599.fastq -k 2 --best -m 2 --threads 4 -l 40 -S 2> SRR1554599.out > SRR1554599.sam 
 
   bowtie genome/hg19 -q SRR1554600.fastq -k 2 --best -m 2 --threads 4 -l 40 -S 2> SRR1554600.out > SRR1554600.sam 
 
   -m 2 Suppress all alignments for a particular read or pair if more than 2 reportable alignments exist for it. 
 
   --threads 4 4个线程 
 
   --best hits guaranteed best stratum; ties broken by quality 
 
   -k 2 report up to 2 good alignments per read (default: 1) 
 
   -l 40 seed length for -n (default: 28) 文献中-l set to read length 
 
   -S/--sam write hits in SAM format

macs建双峰模型

   macs14 -t SRR1554605.sam  -c SRR1554606.sam --format SAM --name "LY4" --wig --single-profile --space=50 --shiftsize=200 --nomodel 
 
   macs14 -t SRR1554599.sam  -c SRR1554600.sam --format SAM --name "SUDHL6" --wig --single-profile --space=50 --shiftsize=200 --nomodel 
 
   -w, --wig             Whether or not to save extended fragment pileup at 
 
                           every WIGEXTEND bps into a wiggle file. When --single- 
 
                           profile is on, only one file for the whole genome is 
 
                           saved. WARNING: this process is time/space consuming!! 
 
   -S, --single-profile  When set, a single wiggle file will be saved for 
 
                           treatment and input. 
 
   --space=SPACE         The resoluation for saving wiggle files, by default, 
 
                           MACS will save the raw tag count every 10 bps. Usable 
 
                           only with '--wig' option. 
 
   --shiftsize=SHIFTSIZE 
 
                           The arbitrary shift size in bp. When nomodel is true, 
 
                           MACS will use this value as 1/2 of fragment size. 
 
                           DEFAULT: 100 
 
   --nomodel             Whether or not to build the shifting model. If True, 
 
                           MACS will not build model. by default it means 
 
                           shifting size = 100, try to set shiftsize to change 
 
                           it. DEFAULT: False

放到IGV中对ASNS基因进行可视化。

至此，文章中Chip-Seq display一段算是完成了

数据准备

接下来就是这篇文章的精髓了，从chip-seq数据里面寻找indel

首先是数据准备

下载bowtie hg19索引和 bowtie2 hg19索引

   基因组从bowtie和bowtie2官网下载 
 
   H. sapiens, UCSC hg19     3.5 GB 
 
   or: part 1 (1.5 GB), part 2 (650 MB), part 3 (1.5 GB) 
 
   H. sapiens, UCSC hg19     2.7 GB 
 
    or: part 1 - 1.7 GB, part 2 - 1.0 GB 
 
    colorspace: full, or part 1, part 2

blat程序所需的染色体序列，同样下载UCSC的

http://hgdownload.soe.ucsc.edu/downloads.html#human

进入http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/

下载chromFa.tar.gz 20-Mar-2009 09:21 905M

寻找Indel

主要还是根据大牛的脚本走

https://bitbucket.org/young_computation/indelsfromchipseq

顺序是：首先CANCERS/run_everything.sh ，然后是Scaffold_Dev/run_everything.sh

这两步，基本上把文章的精髓给做完了，把所有的INDEL都找出来了。

两者方法差不多，都是基于一个思路：先用bowtie1比对，把没比对上的reads调整参数用bowtie2再比对，然后通过perl脚本以及一系列awk，sed命令进行处理，得到Indel

而Scaffold_Dev在处理时候，将第一步没比对上的reads用Edena进行了一定的拼接，所以最后找出来的Indel会长一些（其实测序reads就40bp，也不会太长）

接下来做dbSNP和Enhancers文件夹里面的内容

这两步会将找出来的Indel与“SNP以及Chip-seq分析出来的enhancer”进行overlap，整合分析

结果

由于涉及有关科研课题，具体结果就没有啦~等着以后看PubMed吧

不过也用文章的数据，将这篇文章的结果重复出来啦，特别是Jurkat细胞系里那个GTTAGGAAACGG 12bp的插入找出来了。

脚本

最后也写了一个脚本，理论上搜集好一堆Chip-seq数据，安装配置好相关软件，就能自动运行分析。有些Indel地方因为reads coverage比较高，IGV可视化的时候也能看出来有很多reads能证明这个Indel是真实可靠的，不过最有力的证据还是要sanger测序进行相关实验，结合相关gene解释生物学问题。

wangyunpeng_bio

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Chip-seq数据寻找Indel

背景介绍：癌症病人基因组内有许许多多的Indel，这些Indel与发病机制有着密切关系，而通常的做法是对癌症病人进行基因组测序或者外显子测序，找出相关Indel。而Chip-seq数据可以找出Enhancer，对Chip-seq数据进行处理，可以将Indel与Enhancer进行结合分析方法介绍：简单地说，就是follow最近的高分文章《Small genomic insertio
复制链接

扫一扫

专栏目录