https://github.com/descostesn/BiocNYC-ChIPSeqSpike
https://rdrr.io/bioc/ChIPSeqSpike/f/inst/doc/ChIPSeqSpike.pdf
https://rdrr.io/bioc/ChIPSeqSpike/f/vignettes/ChIPSeqSpike.Rmd
https://mp.weixin.qq.com/s/La_AwYYf0atmFvsHh7AL6Q
I-2 The data
In this workshop we will make use of one dataset of published data. This dataset is coming from the article:
Orlando, David A, Mei Wei Chen, Victoria E Brown, Snehakumari Solanki, Yoon J Choi, EricR Olson, Christian C Fritz, James E Bradner, and Matthew G Guenther. 2014. “Quantitative ChIP-Seq Normalization Reveals Global Modulation of the Epigenome.” Cell Reports 9 (3). Cell press:1163–70.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60104 See manuscript for details
Quantitative ChIP-Seq normalization reveals global modulation of the epigenome
The whole dataset is accessible at GSE60104. Specifically, the data used are H3K79me2 0% (GSM1465004), H3K79me2 50% (GSM1465006), H3K79me2 100% (GSM1465008), input DNA 0% (GSM1511465), input DNA 50% (GSM1511467) and input DNA 100% (GSM1511469).
You can download the material for this workshop or use the commands indicated below:
H3K79me2_0: hg19.bw, dm3.bam, hg19.bam
H3K79me2_50: hg19.bw, dm3.bam, hg19.bam
H3K79me2_100: hg19.bw, dm3.bam, hg19.bam
input_0: hg19.bw, dm3.bam, hg19.bam
input_50: hg19.bw, dm3.bam, hg19.bam
input_100: hg19.bw, dm3.bam, hg19.bam
Info table: info.csv
Gene annotations: refseq_hg19.gff
Processed data: treated_data.Rdat
Create a folder ‘workshop_files’ containing all the downloaded files. You can run the following code:
needed.files <- c(“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_0.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_0_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_0_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_50.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_50_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_50_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_100.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_100_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/H3K79me2_100_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_0.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_0_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_0_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_50.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_50_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_50_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_100.bw”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_100_dm3.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/input_100_hg19.bam”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/info.csv”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/refseq_hg19.gff”,
“http://www.hpc.med.nyu.edu/~descon01/biocnycworkshop/treatedData.Rdat”)
dir.create(“workshop_files”)
get.files <- needed.files[!basename(needed.files) %in% list.files(“workshop_files”)]
for (f in get.files){
download.file(f, destfile = file.path(“workshop_files”, basename(f)))
}
Data processing Sample 20140404_1216 was aligned to hg19 using Bowtie2 with default parameters
Sample 20131220_425 was aligned to dm3 using Bowtie2 with default parameters
All other samples were aligned to a combined hg19/dm3 genome and reads were separated into each organism post-alignment. See manuscript for details.
Supplementary_files_format_and_content: IGV compatible TDF files represent spike-in normalized (.spikein) counts for reads aligning to human (.hg19) or drosophila (.dm3). See manuscript for details. Except for 20140404_1216 and 20131220_425 which did not contain spike-in and were normalized using RPM.
https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR1536548/SRR1536548
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR153/008/SRR1536548/(单端数据,单端比对代码暂时没有)
Library Construction, Sequencing, and Data Collection
Libraries were constructed with the Illumina Tru-Seq library preparation kit using a target fragment size of 200–400 bp and multiplexing barcodes. Libraries were sequenced using Illumina HiSeq 2000 with single-end reads for 40 cycles. Sequences were demultiplexed and aligned using Bowtie2 against a “genome” that combines the human hg19 genome and the Drosophila dm3 genome (see the Supplemental Experimental Procedureshttps://www.sciencedirect.com/science/article/pii/S2211124714008729#app3). Individual accession numbers and read statistics available in Table S2.
https://ars.els-cdn.com/content/image/1-s2.0-S2211124714008729-mmc1.pdf
chip-spinkin 双端比对
cat $config |while read i
do
# echo
i
c
o
n
f
=
(
i conf=(
iconf=(i)
sample=${conf[1]}
fq1=$wkd/clean/${sample}.fq1.gz
fq2=$wkd/clean/${sample}.fq2.gz
## step1: align mouse
if [ ! -f log/${sample}.mm.bowtie2.done ]; then
bowtie2 -p 4 --local --very-sensitive-local \
--no-mixed --no-unal --no-discordant \
-q --phred33 \
-x $mm_index -1 $fq1 -2 $fq2 2> log/log.bowtie2.mm.${sample}.txt| \
samtools view -bhS -q 25 - | \
samtools sort -O bam -@ 4 -o - > align/${sample}.mm.bam
fi
if [ $? -eq 0 ]; then
touch log/${sample}.mm.bowtie2.done
else
touch log/${sample}.mm.bowtie2.failed
fi
# remove duplicate
if [ ! -f log/${sample}.mm.sambamba.done ]; then
sambamba markdup -r align/${sample}.mm.bam align/${sample}_rm.mm.bam && \
samtools index align/${sample}_rm.mm.bam
fi
if [ $? -eq 0 ]; then
touch log/${sample}.mm.sambamba.done
else
touch log/${sample}.mm.sambamba.failed
fi
## step2: align human
if [ ! -f log/${sample}.hg.bowtie2.done ]; then
bowtie2 -p 4 --local --very-sensitive-local \
--no-mixed --no-unal --no-discordant \
-q --phred33 \
-x $hg_index -1 $fq1 -2 $fq2 2> log/log.bowtie2.hg.${sample}.txt| \
samtools view -bhS -q 25 - | \
samtools sort -O bam -@ 4 -o - > align/${sample}.hg.bam
fi
if [ $? -eq 0 ]; then
touch log/${sample}.hg.bowtie2.done
else
touch log/${sample}.hg.bowtie2.failed
fi
# remove duplicate
if [ ! -f log/${sample}.hg.sambamba.done ]; then
sambamba markdup -r align/${sample}.hg.bam align/${sample}_rm.hg.bam && \
samtools index align/${sample}_rm.hg.bam
fi
if [ $? -eq 0 ]; then
touch log/${sample}.hg.sambamba.done
else
touch log/${sample}.hg.sambamba.failed
fi
done
需要注意:estimateScalingFactors这里默认是单端数据paired=F,如果你的数据是双端的,会报错,设置成paired=T就可以了
提供的bw需要保证起始、终止坐标一致,因为在inputSubtraction这一步需要用处理组减去input组。如果使用bamCoverage转换的bw,就会遇到坐标不一致的情况,这是因为deepTools会将测序深度一致的相邻bins进行合并,以节省文件空间,当然合并后的坐标就不一致了【即使自己设定了–binSize也不能保证一致】(https://github.com/deeptools/deepTools/issues/907)
这波操作可以看之前的推送: bam转bigwig的binsize问题,你遇到了吗?
需要保证这里的bw文件是没有经过任何normalized的,因此在使用bamCoverage时,不需要设置参数–normalizeUsing
单端数据
An Alternative Approach to ChIP-Seq Normalization Enables Detection of Genome- Wide Changes in Histone H3 Lysine 27 Trimethylation upon EZH2 Inhibition的方法描述,先将reads分别比对到不同的基因组,然后去重复,取高质量比对的结果(比如mapping quality > 25)
Data Availability Statement: ChIP-Seq data from
this study have been submitted to the NCBI Gene
Expression Omnibus http://www.ncbi.nlm.nih.gov/
geo/ (accession numbers GSE63494 Homo sapiens and
GSE64243 Homo sapiens).
50-nucleotide sequence reads were aligned to the hg19 genome using the BWA algorithm with default settings. Only reads that passed Illumina’s purity filter, aligned with no more than 2 mismatches and mapped uniquely to the genome, were used in subsequent analyses.
The aligned read files were sorted and duplicates removed using sort and rmdup functions from samtools version 0.1.18 (Li et al., 2009).
Peak calling was done with the SICER-rb script (SICER v1.1; without Input file; using gap=600).
Genome_build: hg19
Supplementary_files_format_and_content: SICER scoreisland interval data for each sample in BED format.
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/002/SRR1658032/SRR1658032.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/003/SRR1658033/SRR1658033.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/004/SRR1658034/SRR1658034.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/005/SRR1658035/SRR1658035.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/006/SRR1658036/SRR1658036.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR165/007/SRR1658037/SRR1658037.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/002/SRR2443142/SRR2443142.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/002/SRR2443142/SRR2443143.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/002/SRR2443142/SRR2443144.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/006/SRR1721586/SRR1721586.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/007/SRR1721587/SRR1721587.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/008/SRR1721588/SRR1721588.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR172/009/SRR1721589/SRR1721589.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/007/SRR2532667.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/008/SRR2532668.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/009/SRR2532669.fastq.gz ./ &
nohup wget -c -t 0 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR253/000/SRR2532670.fastq.gz ./ &