bedtools deeptool

for i in cgas_HA_1_S2_R1_001 cgas_HA_2_S3_R1_001 cgas_control_S1_R1_001; do bedtools map -a i . b e d − b . / c h e n a l i g n / i.bed -b ./chenalign/ i.bedb./chenalign/{i}_trimmed.fq.gz.sort.bam -c 10,10 -o count,concat | awk -v OFS="\t" '{n=length($5);gc=gsub("[gcGC]","", $5); print $1,$2,$3,KaTeX parse error: Expected 'EOF', got '}' at position 7: 4,gc/n}̲'>i.txt; done

bedtools deeptool
for i in ls *.bam
do
echo $i >> ./rmdup
samtools view -c ${i} >> ./rmdup
done

multiBamSummary BED-file --BED selection.bed --bamfiles file1.bam file2.bam -o results.npz
若想要从sam或bam文件中提取指定区域内的reads,可以使用samtools和bedtools来实现。

首先准备一个区域信息文件。region.bed #第一列为染色体ID,第二三列分别为起始终止位置
例: 12 21100 41200 #取12号染色体21100-41200区间的序列

之后如果是sam文件,先转成bam文件:
samtools view -Sb reads.sam > reads.bam
然后就可以使用bedtools来提取reads啦,命令如下:
bedtools intersect -a reads.bam -b region.bed > target_reads.bam

用BED文件和基因组序列fasta文件提取特定的序列集。
http://www.gigazone.me/ngs/bedtools_bed_from_fasta.html (转)
记住bedtools getfasta就行。

https://www.jianshu.com/p/3e6deaf83da9

第二个功能 multicov
然后看看第二个功能,对RNA-seq的比对文件中的比对到各个基因的reads进行计数。
实现这个功能,用的bedtools的multicov 这个小命令

例子:

bedtools multicov -bams aln1.bam aln2.bam aln3.bam -bed ivls-of-interest.bed

ivls-of-interest.bed这个文件是必须的,可能需要自己制作,其实用gtf文件也可以的,如下:

chr1 0 10000 ivl1
chr1 10000 20000 ivl2
chr1 20000 30000 ivl3
chr1 30000 40000 ivl4
输出结果前三列是坐标,第四列是基因名,跟我们的bed文件一样,只是最后三列是三个样本的计数,是添加上来的!
chr1 0 10000 ivl1 100 2234 0
chr1 10000 20000 ivl2 123 3245 1000
chr1 20000 30000 ivl3 213 2332 2034
chr1 30000 40000 ivl4 335 7654 0
可以看到,它实现的需求,跟htseq这个软件差不多。各种区别,大家可以自己取探索。

第三个功能 getfasta
接着第三个功能,根据坐标区域来从基因组里面提取fasta序列

参考:# BED/GFF/VCF +reference --> fasta

bedtools getfasta -fi ~/biosoft/bowtie/hg19_index/hg19.fa -bed …/macs14_results/highQuality_summits.bed -fo highQuality.fa
bedtools getfasta -fi ~/biosoft/bowtie/hg19_index/hg19.fa -bed …/macs14_results/highQuality_peaks.bed -fo highQuality.fa
我的例子脚本里面用的是bed格式来记录坐标区域,参考基因组用-fi参数指定具体位置,输出的fasta序列文件用-fo参数指定

测序reads在染色体上的测序深度分布图绘制
比对软件:bwa
sam格式处理软件:samtools
samtools depth perl
01_countDepthByWindon.pl

生信脚本练习(10)找出fasta文件中最长的转录本https://blog.csdn.net/hxoxh/article/details/77169414?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control
[笔记]python对FASTA文件的处理
https://blog.csdn.net/Cccrush/article/details/80112198?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control

https://blog.csdn.net/msw521sg/article/details/52562187?utm_medium=distribute.pc_relevant.none-task-blog-title-2&spm=1001.2101.3001.4242从gff3文件获得fasta序列

可以从fasta中提取基因序列的4款软件https://www.jianshu.com/p/11b8804e570d

CUT&Tag for efficient epigenomic profiling of small samples and single cells

CUT&Tag for efficient epigenomic profiling of small samples and single cells

CUT&Tag for efficient epigenomic profiling of small samples and single cells
CUT&Tag for efficient epigenomic profiling of small samples and single cellsCUT&Tag for efficient epigenomic profiling of small samples and single cells

bedtools getfasta -fi mm10.fa -bed -fo ${i}_peakseq.fa

liftOver yangDM24h-NC-1_FKDL202607366_peaks.narrowPeak mm10ToMm9.over.chain.gz mm9.bed ummap #失败

sed -i ‘s/\t/:/g’ human.bed
sed -i ‘s/ /#/g’ human.bed

liftOver nc1.bed mm10ToMm9.over.chain.gz nc1mm9.bed nc1ummap
liftOver nc2.bed mm10ToMm9.over.chain.gz nc2mm9.bed nc2ummap

cp yangDM24h-NC-1_FKDL202607366_peaks.narrowPeak ce10.bed
cd /data/zhangyong/yyp/yypold/ref/liftover
(rna) [zhangyong@cluster liftover]$ conda deactivate
(base) [zhangyong@cluster liftover]$ liftOver
liftOver - Move annotations from one assembly to another
#liftOver ce.bed mm10ToMm9.over.chain.gz mm9.bed ummap

liftOver yangDM24h-NC-1_FKDL202607366_peaks.narrowPeak mm10ToMm9.over.chain.gz mm9.bed ummap #失败
sed -i ‘s/\t/:/g’ human.bed
sed -i ‘s/ /#/g’ human.bed

liftOver nc1.bed mm10ToMm9.over.chain.gz nc1mm9.bed nc1ummap
liftOver nc2.bed mm10ToMm9.over.chain.gz nc2mm9.bed nc2ummap

cd /data/zhangyong/yyp/yypold/chips/lh11_3/allbed/
for i in ls *Peak
do
i=${i/_peaks.narrowPeak/}

BED-file mode

multiBamSummary BED-file --BED ${i}_peaks.narrowPeak --bamfiles ${i}.sort.bam -out ${i}_results.npz

done

cd /data/zhangyong/yyp/yypold/chips/lh11_3/allbed/
plotEnrichment -b DM24h-16HD-1_FKDL202607369.sort.bam
–BED ls *Peak
–regionLabels “up regulated” “down regulated”
-o enrichment.png

https://www.jianshu.com/p/6932c72aba63

高通量测序数据处理学习记录(二):Read Counts的提取
https://www.jianshu.com/p/7cc5df9f7900

https://www.jianshu.com/p/1c66cfb423af高通量测序数据处理学习记录(七):使用ChIPQC包检查ChIP-seq的质量

ChIP-Seq分析之ChIPQC结果含义https://mp.weixin.qq.com/s?src=11&timestamp=1607082672&ver=2746&signature=5ImXBAlK9kcTKUY7huOQXQ8G5N8Ksdf3sEku4pnfsM5yWAXGmjJLhmr17ZwdevKzvfbmy0tw0tG*iimlMkA0AXS33iDiNIZhXGtZA5qI2s1uuvUB4iXPXpukAxuWHV1O&new=1

https://hbctraining.github.io/Intro-to-ChIPseq/lessons/06_combine_chipQC_and_metrics.html

测序深度的计算公式为LN/G,L就是读段长度,N是读段数目,G是基因组大小。
举例来说,人类基因组3.1G,一个RNA-seq的reads数据为20M,数据为paired-end,读长150bp,那么测序深度就是20M2150/3.1G= 2 X

那我们再来讨论一下sequencing depth 和 sequencing coverage的区别:
事实上没有区别,若是真要讲个说法,那就是coverage可以理解为检测到全基因组的多少区域(百分比,最大值为100%),但是sequencing covergae指的就是depth of sequencing coverage也就是sequencing depth,反映了一个区域被平均多少个reads检测到。
另外breadth of coverage需要区别一下,就是将全基因组大小换成target region作为分母计算上面那个公司,

https://mp.weixin.qq.com/s?src=11&timestamp=1607082672&ver=2746&signature=5ImXBAlK9kcTKUY7huOQXQ8G5N8Ksdf3sEku4pnfsM75IEeVr5rfDQAKLFlymvLs3glGw2Hlj272vbqcKu38y3ojgCVdcj82LdS6hFkZptOjYZC4wAudS49qLcFo4qwt&new=1ChIP-Seq数据质控之ChIPQC使用

https://www.plob.org/article/16026.html

https://www.jianshu.com/p/7027251a98fc GeneOverlap: 用于基因Overlap的检验及可视化R包

https://blog.csdn.net/u012110870/article/details/107907582

https://blog.csdn.net/weixin_43569478/article/details/108079450?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522160709138519724838592357%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=160709138519724838592357&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allfirst_rank_v2~rank_v29-1-108079450.nonecase&utm_term=chipseq%20overlap&spm=1018.2118.3001.4449

  1. peak 在染色体上的分布
  2. peak间的overlap分析

http://bioconductor.org/packages/release/bioc/vignettes/ChIPseeker/inst/doc/ChIPseeker.html#overlap-of-peaks-and-annotated-genes
Overlap of peaks and annotated genes
Statistical testing of ChIP seq overlap
enrichPeakOverlap(queryPeak = files[[5]],
targetPeak = unlist(files[1:4]),
TxDb = txdb,
pAdjustMethod = “BH”,
nShuffle = 50,
chainFile = NULL,
verbose = FALSE)

第6篇:重复样本的处理——IDR
https://www.jianshu.com/p/d8a7056b4294

如何得到两个重复样本间一致性的peaks? 一种简单粗暴的方法就是用bedtools计算peaks的overlaps。

bedtools intersect
-a pseudo_WT-DM_MyoD_peaks.bed
-b new
-wo > newdNanog-overlaps.bed

bedtools intersect
-a pseudo_WT-DM_MyoD_peaks.bed
-b pseudo_WT-GM_MyoD_peaks.bed
-wo > DMGM_overlaps.bed

source activate rna
cd /data/zhangyong/yyp/yypold/chips/chendata/overlap

for k in ls *narrowPeak
do
for i in …/pseudo_WT-DM_MyoD_peaks.bed …/pseudo_WT-GM_MyoD_peaks.bed
do
echo $i

bedtools intersect
-a ${i}
-b ${k}
-wo|wc -l >>coun

done
done

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值