spik-in

https://github.com/descostesn/BiocNYC-ChIPSeqSpike
https://rdrr.io/bioc/ChIPSeqSpike/f/inst/doc/ChIPSeqSpike.pdf
https://rdrr.io/bioc/ChIPSeqSpike/f/vignettes/ChIPSeqSpike.Rmd

https://mp.weixin.qq.com/s/La_AwYYf0atmFvsHh7AL6Q

I-2 The data
In this workshop we will make use of one dataset of published data. This dataset is coming from the article:

Orlando, David A, Mei Wei Chen, Victoria E Brown, Snehakumari Solanki, Yoon J Choi, EricR Olson, Christian C Fritz, James E Bradner, and Matthew G Guenther. 2014. “Quantitative ChIP-Seq Normalization Reveals Global Modulation of the Epigenome.” Cell Reports 9 (3). Cell press:1163–70.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60104 See manuscript for details

Quantitative ChIP-Seq normalization reveals global modulation of the epigenome
The whole dataset is accessible at GSE60104. Specifically, the data used are H3K79me2 0% (GSM1465004), H3K79me2 50% (GSM1465006), H3K79me2 100% (GSM1465008), input DNA 0% (GSM1511465), input DNA 50% (GSM1511467) and input DNA 100% (GSM1511469).

You can download the material for this workshop or use the commands indicated below:

H3K79me2_0: hg19.bw, dm3.bam, hg19.bam

H3K79me2_50: hg19.bw, dm3.bam, hg19.bam

H3K79me2_100: hg19.bw, dm3.bam, hg19.bam

input_0: hg19.bw, dm3.bam, hg19.bam

input_50: hg19.bw, dm3.bam, hg19.bam

input_100: hg19.bw, dm3.bam, hg19.bam

Info table: info.csv

Gene annotations: refseq_hg19.gff

Processed data: treated_data.Rdat

Create a folder ‘workshop_files’ containing all the downloaded files. You can run the following code:

needed.files <- c(“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_0.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_0_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_0_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_50.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_50_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_50_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_100.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_100_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_100_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_0.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_0_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_0_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_50.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_50_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_50_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_100.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_100_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_100_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/info.csv”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/refseq_hg19.gff”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/treatedData.Rdat”)
dir.create(“workshop_files”)
get.files <- needed.files[!basename(needed.files) %in% list.files(“workshop_files”)]
for (f in get.files){
download.file(f, destfile = file.path(“workshop_files”, basename(f)))
}

Data processing Sample 20140404_1216 was aligned to hg19 using Bowtie2 with default parameters
Sample 20131220_425 was aligned to dm3 using Bowtie2 with default parameters
All other samples were aligned to a combined hg19/dm3 genome and reads were separated into each organism post-alignment. See manuscript for details.
Supplementary_files_format_and_content: IGV compatible TDF files represent spike-in normalized (.spikein) counts for reads aligning to human (.hg19) or drosophila (.dm3). See manuscript for details. Except for 20140404_1216 and 20131220_425 which did not contain spike-in and were normalized using RPM.

https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR1536548/SRR1536548
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR153/008/SRR1536548/(单端数据,单端比对代码暂时没有)

Library Construction, Sequencing, and Data Collection
Libraries were constructed with the Illumina Tru-Seq library preparation kit using a target fragment size of 200–400 bp and multiplexing barcodes. Libraries were sequenced using Illumina HiSeq 2000 with single-end reads for 40 cycles. Sequences were demultiplexed and aligned using Bowtie2 against a “genome” that combines the human hg19 genome and the Drosophila dm3 genome (see the Supplemental Experimental Procedureshttps://www.sciencedirect.com/science/article/pii/S2211124714008729#app3). Individual accession numbers and read statistics available in Table S2.
https://ars.els-cdn.com/content/image/1-s2.0-S2211124714008729-mmc1.pdf

chip-spinkin 双端比对
cat $config |while read i
do
# echo i c o n f = ( i conf=( iconf=(i)
sample=${conf[1]}

fq1=$wkd/clean/${sample}.fq1.gz
fq2=$wkd/clean/${sample}.fq2.gz

## step1: align mouse
if [  ! -f  log/${sample}.mm.bowtie2.done ]; then
      bowtie2  -p 4  --local --very-sensitive-local \
          --no-mixed --no-unal --no-discordant \
          -q --phred33 \
          -x $mm_index -1 $fq1 -2 $fq2 2> log/log.bowtie2.mm.${sample}.txt| \
      samtools view -bhS -q 25 - | \
      samtools sort  -O bam  -@ 4 -o - > align/${sample}.mm.bam
fi 
if [ $? -eq 0 ]; then
         touch  log/${sample}.mm.bowtie2.done
      else
         touch log/${sample}.mm.bowtie2.failed
fi

# remove duplicate
if [  ! -f log/${sample}.mm.sambamba.done ]; then
      sambamba markdup  -r  align/${sample}.mm.bam align/${sample}_rm.mm.bam && \
      samtools index align/${sample}_rm.mm.bam
fi 
if [ $? -eq 0 ]; then
         touch  log/${sample}.mm.sambamba.done
      else
         touch log/${sample}.mm.sambamba.failed
fi

## step2: align human
if [  ! -f  log/${sample}.hg.bowtie2.done ]; then
      bowtie2  -p 4  --local --very-sensitive-local \
          --no-mixed --no-unal --no-discordant \
          -q --phred33 \
          -x $hg_index -1 $fq1 -2 $fq2 2> log/log.bowtie2.hg.${sample}.txt| \
      samtools view -bhS -q 25 - | \
      samtools sort  -O bam  -@ 4 -o - > align/${sample}.hg.bam
fi 
if [ $? -eq 0 ]; then
         touch  log/${sample}.hg.bowtie2.done
      else
         touch log/${sample}.hg.bowtie2.failed
fi

# remove duplicate
if [  ! -f log/${sample}.hg.sambamba.done ]; then
      sambamba markdup  -r  align/${sample}.hg.bam align/${sample}_rm.hg.bam && \
      samtools index align/${sample}_rm.hg.bam
fi 
if [ $? -eq 0 ]; then
         touch  log/${sample}.hg.sambamba.done
      else
         touch log/${sample}.hg.sambamba.failed
fi

done

需要注意:estimateScalingFactors这里默认是单端数据paired=F,如果你的数据是双端的,会报错,设置成paired=T就可以了

提供的bw需要保证起始、终止坐标一致,因为在inputSubtraction这一步需要用处理组减去input组。如果使用bamCoverage转换的bw,就会遇到坐标不一致的情况,这是因为deepTools会将测序深度一致的相邻bins进行合并,以节省文件空间,当然合并后的坐标就不一致了【即使自己设定了–binSize也不能保证一致】(https://github.com/deeptools/deepTools/issues/907)
这波操作可以看之前的推送: bam转bigwig的binsize问题,你遇到了吗?

需要保证这里的bw文件是没有经过任何normalized的,因此在使用bamCoverage时,不需要设置参数–normalizeUsing

单端数据
An Alternative Approach to ChIP-Seq Normalization Enables Detection of Genome- Wide Changes in Histone H3 Lysine 27 Trimethylation upon EZH2 Inhibition的方法描述,先将reads分别比对到不同的基因组,然后去重复,取高质量比对的结果(比如mapping quality > 25)
Data Availability Statement: ChIP-Seq data from
this study have been submitted to the NCBI Gene
Expression Omnibus http://www.ncbi.nlm.nih.gov/
geo/ (accession numbers GSE63494 Homo sapiens and
GSE64243 Homo sapiens).
50-nucleotide sequence reads were aligned to the hg19 genome using the BWA algorithm with default settings. Only reads that passed Illumina’s purity filter, aligned with no more than 2 mismatches and mapped uniquely to the genome, were used in subsequent analyses.
The aligned read files were sorted and duplicates removed using sort and rmdup functions from samtools version 0.1.18 (Li et al., 2009).
Peak calling was done with the SICER-rb script (SICER v1.1; without Input file; using gap=600).
Genome_build: hg19
Supplementary_files_format_and_content: SICER scoreisland interval data for each sample in BED format.

nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/002/SRR1658032/SRR1658032.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/003/SRR1658033/SRR1658033.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/004/SRR1658034/SRR1658034.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/005/SRR1658035/SRR1658035.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/006/SRR1658036/SRR1658036.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/007/SRR1658037/SRR1658037.fastq.gz ./ &

nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/002/SRR2443142/SRR2443142.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/002/SRR2443142/SRR2443143.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/002/SRR2443142/SRR2443144.fastq.gz ./ &

nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/006/SRR1721586/SRR1721586.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/007/SRR1721587/SRR1721587.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/008/SRR1721588/SRR1721588.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/009/SRR1721589/SRR1721589.fastq.gz ./ &

nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/007/SRR2532667.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/008/SRR2532668.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/009/SRR2532669.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/000/SRR2532670.fastq.gz ./ &

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值