- 博客(182)
- 收藏
- 关注
原创 only use good-quality or uniquely-mapped reads your bam files
only use good-quality or uniquely-mapped reads your bam filesjava -jar picard.jar MarkDuplicates I=input.bam O=rmdup.bam M=rmdup_metrics.txtCREATE_INDEX=true ASSUME_SORTED=false REMOVE_DUPLICATES=true"samtools view -hb -q 1 rmdup.bam > rmdupq1.
2021-02-25 11:26:46
2
原创 invalid multibyte string, element 1
invalid multibyte string, element 1options(“download.file.method” = “wininet”)getOption(“download.file.method”)[1] “wininet”install_github(‘stjude/ChIPseqSpikeInFree’, host = “api.github.com”)错误: Failed to install ‘unknown package’ from GitHub:inval
2021-02-25 10:39:01
2
原创 bdg2bw(macs2的peak要修正坐标)
bdg2bw(macs2的peak要修正坐标)Run MACS2 combining replicates.You don’t need to use “samtools merge”, just$ macs2 callpeak -t H3K36me1_EE_rep1.bam H3K36me1_EE_rep2.bam -c Input_EE_rep1.bam Input_EE_rep2.bam -B --nomodel --extsize 147 --SPMR -g ce -n H3K36me1_EE
2021-02-24 15:37:08
6
原创 combining replicates(bam)
Run MACS2 combining replicates.You don’t need to use “samtools merge”, just$ macs2 callpeak -t H3K36me1_EE_rep1.bam H3K36me1_EE_rep2.bam -c Input_EE_rep1.bam Input_EE_rep2.bam -B --nomodel --extsize 147 --SPMR -g ce -n H3K36me1_EEhttps://www.sogou.com/l
2021-02-23 14:38:51
5
原创 使用htseq-count进行定量分析
使用htseq-count进行定量分析1 htseq-count的输入文件输入为sam格式的文件,如果是paired-end数据必须按照reads名称排序(sort by name)。官方推荐了msort,不过我用起来感觉不是很方便(也可能是使用方法不当),于是我采用了samtools先对bam文件(tophat2 的输出结果为bam)排序,再转换为sam。命令:samtools sort -n file.bam #sort bam by namesamtools view -h bamfile
2021-02-23 10:48:47
18
原创 构造语句,批量改名
(base) yyp@DESKTOP-LRUCLJJ:~$ echo CRR126377.MKO_DM_TF_input_ChIPseq | awk -F"." ‘{print “mv “$1”.fq.gz “$2”.fq.gz”}’mv CRR126377.fq.gz MKO_DM_TF_input_ChIPseq.fq.gzcat ee| awk -F"." ‘{print “mv “$1”.fq.gz “$2”.fq.gz”}’| bash####构造语句,批量改名echo CRR126377
2021-02-22 16:18:07
4
原创 PCA
这篇Nature子刊文章的蛋白组学数据PCA分析竟花费了我两天时间来重现|附全过程代码…https://blog.csdn.net/qazplm12_3/article/details/106089137?ops_request_misc=&request_id=&biz_id=102&utm_term=%E8%9B%8B%E7%99%BD%E7%BB%84%E5%AD%A6%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90&utm_medium=dis
2021-02-03 11:36:09
22
原创 reads去重复
讨厌又迷人的reads去重复https://www.jianshu.com/p/5781e7d74c40太长不看系列RNA-seq一般不去重复ChIP-seq一般去重复call SNP一般去重复万事无绝对,还需参考起始量和PCR扩增数判断是否去重复。reads mapping覆盖均匀度可以判断是否需要去重复。PCR去重工具首选Picard根源上解决去重复问题:起始量高,循环数少,reads能长不短,能双端不单端PCR重复的危害理论上来讲,不同的序列在进行PCR扩增时,扩增的倍数应该是相同
2021-01-30 20:51:23
12
原创 R语言:读取csv文件后,数据变成了因子? 说说R语言的转置
R语言:读取csv文件后,数据变成了因子?https://www.jianshu.com/p/06d873b82d49
2021-01-29 14:52:51
16
原创 NGS可变剪切之STAR+rmats软件使用
NGS可变剪切之STAR+rmats软件使用https://cloud.tencent.com/developer/article/1587437
2021-01-18 11:08:07
16
原创 修改Docker Desktop Installer的镜像
修改Docker Desktop Installer的镜像{“experimental”: false,“debug”: true,“registry-mirrors”: [“https://xxxxxx.mirror.aliyuncs.com”,“http://hub-mirror.c.163.com”]}
2021-01-17 21:31:10
27
原创 WSL2 安装 Docker
WSL2 安装 Docker方式一:Docker Desktop+WSL2 运行 DockerDocker Desktop 将 Docker CE、Docker Compose、Kubernets 等软件整合在了一起进行安装,省去了一一安装的烦恼。Docker Daemon 由于是安装在宿主机上的,因此可以直接使用宿主机的网卡信息对容器进行访问。下载安装Docker Desktop运行 Docker,可以让你在Windows中方便的管理配置DockerDocker for Windows:http
2021-01-17 15:48:42
48
原创 python 字符串(str)和列表(list)的互相转换
python 字符串(str)和列表(list)的互相转换缥缈之力 2016-12-02 12:12:04 377370 收藏 107分类专栏: python 文章标签: python list str版权1.str >>>liststr1 = “12345”list1 = list(str1)print list1str2 = “123 sjhid dhi”list2 = str2.split() #or list2 = str2.split(" ")prin
2021-01-16 14:28:44
21
原创 straw (extra reads)
straw (extra reads)Pythonnchernia edited this page on 15 May 2019 · 19 revisionsInstallThe easiest way to install is via pip:python3 -m pip install hic-strawStraw uses the requests library for support of URLs. Be sure it is installed.After importing
2021-01-15 10:55:47
20
原创 bam文件的理解
bam文件的理解https://zhuanlan.zhihu.com/p/31405418?from_voters_page=truehttps://www.jianshu.com/p/705330b383c3$ less -SN in.sam # 打开sam文件$ samtools view -h in.bam # 打开bam文件 接下来我们重点看看每一列在我们的分析中起到的作用。 - reads name: 每一条reads的查询名称,来源于fastq文件; -
2021-01-14 16:30:29
28
原创 对bam文件作基础统计
对bam文件作基础统计https://www.jianshu.com/p/4bc060bc6785一只烟酒僧0.0782020.09.12 23:16:06字数 76阅读 245参考链接https://www.jianshu.com/p/82ed6e27f571一、有多少read比对到参考基因组上使用软件:samtools flagstat或idxstats参考samtools命令详解http://www.cnblogs.com/emanlee/p/4316581.html命令:sam
2021-01-14 16:05:47
27
原创 Hi-C data analysis tools and papers
Hi-C data analysis tools and papers全文链接如下:https://github.com/mdozmorov/HiC_toolsTools are sorted by publication date, newest on top. Unpublished tools are listed at the end of each section. Related repositories: HiC_data, scHiC_notes. Please, open an is
2021-01-10 20:17:36
127
原创 降采样
对vali降采样https://www.jianshu.com/p/e08f5ace5e3amy_data<-datamatrixdim(datamatrix)install.packages(“tidyverse”)library(“tidyverse”)set.seed(1234)#无放回的随机取五行my_data %>% sample_n(5, replace = FALSE)#无放回的随机取5%行my_data %>% sample_frac(0.05, rep
2021-01-08 17:10:23
8
原创 根据SRR号直接下载fastq文件分析
根据SRR号直接下载fastq文件分析SRR12246717 2,984,630H3K27me3:SH_Hs_K27m3_NX_0918 as replicate 1: GEO accession: GSE145187, SRA entry: SRX8754646 SRR12246717 2,984,630 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/017/SRR12246717/ftp://ftp.sra.ebi.ac.uk/vol1/fastq
2021-01-05 09:11:14
38
原创 prcomp 报错
prcomp 报错:Error in prcomp.default(pca_data, center = T, scale = T) :cannot rescale a constant/zero column to unit varianceHow to solve prcomp.default(): cannot rescale a constant/zero column to unit varianceI have a data set of 9 samples (rows) with
2021-01-02 23:27:22
57
原创 hic的ice命令的问题
hic的ice命令的问题把ice的代码更新为下面的代码就可以了#!/home/s312/miniconda3/envs/rna/bin/pythonimport sysimport argparseimport numpy as npfrom scipy import sparseimport icedfrom iced.io import loadtxt, savetxtparser = argparse.ArgumentParser(“ICE normalization”)parse
2021-01-02 18:39:03
57
1
原创 picard降采样版本问题
picard降采样版本问题I have been using Picard’s DownsampleSam to extract random alignments from a .bam file.The tool documentation for DownsampleSam states thatReads marked as not primary alignments are all discardedHowever, when running DownsampleSam (Picard
2021-01-02 16:13:22
21
原创 手把手教你计算FRiP score的值
手把手教你计算FRiP score的值https://blog.csdn.net/weixin_43569478/article/details/108079775计算比对上参考基因组的reads总数wc -l sample.tn5.shift.tagalign计算peak区域的reads总数bedtools intersect -a sample.tn5.shift.tagalign -b sample.narrowpeak -wa -u | wc -l将两个数相除就得到了FRiP score
2020-12-16 23:36:24
31
原创 bedtools deeptool
bedtools deeptoolfor i in ls *.bamdoecho $i >> ./rmdupsamtools view -c ${i} >> ./rmdupdonemultiBamSummary BED-file --BED selection.bed --bamfiles file1.bam file2.bam -o results.npz若想要从sam或bam文件中提取指定区域内的reads,可以使用samtools和bedtools来实现。首先准
2020-12-16 15:53:03
51
原创 samtools抽取染色体bam文件
samtools抽取染色体bam文件/data/zhangyong/yyp/yypold/chips/lh11_3/data/X101SC20070420-Z01-J003/allcleandata/sam/chenalign/NC/chenpkawk ’ { $1=“chr18”;print $0 }’ DM24h-NC-1_FKDL202607366_peaks.narrowPeak > NC1_chr18.narrowPeakawk ’ { $1=“chr18”;print $0 }’ D
2020-12-14 16:13:05
230
原创 CHipQC在PC环境成功运行
#SampleID Tissue Factor Condition Treatment Replicate bamReads ControlID bamControl Peaks PeakCaller#BT4741 BT474 ER Resistant Full-Media 1 reads/Chr18_BT474_ER_1.bam reads/Chr18_BT474_input.bam reads/Chr18_BT474_input.bam peaks/BT474_ER_1.bed.gz bed#BT.
2020-12-13 23:29:28
16
原创 统计bed文件下的reads数目
【bedtools mapbedtools covaragedeeptools covarage】统计bed文件下的reads数目和GC含量https://blog.csdn.net/Cassiel60/article/details/89310811输入文件为:bam格式例如:计算一个bed文件中的reads数和GC含量bedtools map -a target.bed -b realigned.bam -c 10,10 -o count,concat | awk -v OFS="\t"
2020-12-11 21:39:59
193
原创 我们从UCSC、NCBI或Ensembl下载的参考基因组都是正链碱基序列。
这也就是说:我们从UCSC、NCBI或Ensembl下载的参考基因组都是正链碱基序列。但是基因分布是多样的,有的本身就在正链,即:基因对应的转录本序列恰好和正链上5‘到3’的碱基序列一致;又有的基因存在于负链,基因对应的转录本序列(以及它对应的氨基酸序列)则是和负链的5‘到3’方向的序列一致。因此,如果想从参考基因组中抽出位于负链的基因序列,需要:1.先抽出参考基因组给的序列;2.将序列反向互补。 ucsc,ensembl等定义基因在正链上是指转录本序列与正链上该基因的序列一致https://
2020-12-09 11:24:07
142
原创 把fasta文件做成字典
把fasta文件做成字典https://www.jianshu.com/p/b10af6b7da1f读完一整条fasta再赋值fl = open(‘Spenn.fasta’)dic1= {}chro = fl.readline().strip().split(’-’)[1]seq = ‘’for i in fl:if ‘>’ in i:dic1[chro] = seqchro = i.strip().split(’-’)[1]seq = ‘’else:seq += i.stri
2020-12-09 10:21:37
55
1
原创 computeMatrix: Optional arguments
computeMatrix: Optional argumentsonly for scale-regions mode:| Command | Expected Input | Explanation | |:----????:----????:----| | --regionBodyLength | INTEGER | Distance in bp to which all regions are going to be fitted. (default: 1000) | | --startLabe
2020-12-08 15:52:06
15
原创 liftOver - 基因组坐标在不同基因组注释版本间转换
ChIP-Seq数据挖掘系列-4: liftOver - 基因组坐标在不同基因组注释版本间转换https://www.jianshu.com/p/75758684b9cf
2020-12-07 22:26:08
50
原创 Basics of ChIP-seq data analysis
https://www.bioconductor.org/help/course-materials/2016/CSAMA/lab-5-chipseq/Epigenetics.htmlBasics of ChIP-seq data analysis夏目沉吟:https://rpkgs.datanovia.com/ggpubr/reference/ggpie.html夏目沉吟:https://stackoverflow.com/questions/3397885/how-do-you-read-in
2020-12-03 09:18:44
20
原创 readcount_summaryoverlap
https://bioconductor.org/packages/release/bioc/vignettes/GenomicAlignments/inst/doc/summarizeOverlaps.pdfhttps://www.rdocumentation.org/packages/GenomicAlignments/versions/1.8.4/topics/summarizeOverlaps-methods
2020-12-01 17:10:24
13
原创 32位系统加载不了64位的dll。。。是不是没有为此架构安装?
32位系统加载不了64位的dll。。。是不是没有为此架构安装?Error: package or namespace load failed for ‘DiffBind’ in library.dynam(lib, p错误: package or namespace load failed for ‘openxlsx’ in library.dynam(lib, package, package.lib):没有这个DLL ‘zip’:是不是没有为此架构安装?本文来自: 人大经济论坛 R语言论坛 版
2020-11-22 17:34:39
107
原创 motif气泡图
motif气泡图y<- read.table(“C:\Users\Administrator\Desktop\motif气泡图.csv”,sep= “,” , header= T)head(y)#motif logPvalue type#1 RBPJ -1281.00 Female-ko#2 LEF1 -1472.00 Female-ko#3 PRDM1 -683.00 Female-ko#4 NKX2.1 -992.10 Female-ko#5 TCF3
2020-11-22 17:19:59
44
原创 deeptools做overlap差异peak
$ cat A.bedchr1 100 200$ cat B.bedchr1 130 201chr1 180 220$ bedtools intersect -a A.bed -b B.bed -f 0.50 -wa -wbchr1 100 200 chr1 130 2011\bedtools intersect -a DMNCvsGMNCdeseq2_sig.bed -b DMdeseq2_sig.bed -f 0.50 -wa -wb > overlap0.5$ cat A.bed
2020-11-13 16:30:56
21
空空如也
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人 TA的粉丝