PART III Practice: Bioinformatics Data Skills
Chapter7 Unix Data Tools
Inspecting and Manipulating Text Data with Unix Tools
In this chapter, we’ll work with very simple genomic feature formats: BED (three column) and GTF files.
基因组数据注释常用的文件——Bed文件和GFF文件简介:http://blog.sina.com.cn/s/blog_80572f5d0102x5m7.html
bed文件格式解读:
http://www.mamicode.com/info-detail-2417885.html
转录本的bed文件下载:
https://www.jianshu.com/p/298fe5bd6d45
生物信息学常见的数据下载,包括基因组,gtf,bed,注释
http://www.biotrainee.com/thread-857-1-1.html
ensembl和NCBI基因组下载,基因序列下载查看:
http://www.omicsclass.com/article/58
生信技能树:
http://www.biotrainee.com/forum.php
bedtools 使用小结:使用BedTools计算两基因组的overlap区域
https://www.plob.org/article/10690.html
BED文件以及如何正确的从UCSC下载BED文件:
https://www.jianshu.com/p/2c6b8fb03c58
Ensembl与NCBI Map Viewer和UCSC的区别:
https://www.cnblogs.com/RyannBio/p/9561216.html
NGS基础 - 参考基因组和基因注释文件:
https://mp.weixin.qq.com/s/2OoXy4f1t0hE8OUqsAt1kw
使用wget -O result.txt 'http://www.ensembl.org/biomart/martservice?query= + XML中的内容 (调整为一行,并且行尾加一个单引号)即可反复使用。如果想换一个物种,只需修改对应的Dataset name即可。
http://www.ensembl.org/biomart/martview/a56aa65813382caa8cf3218721f16a0f
批量解压缩:
https://www.cnblogs.com/lansor/archive/2012/07/03/2574214.html
压缩/解压各种类型文件:
http://ask.zol.com.cn/x/4228195.html
linux的压缩解压命令全解:
https://www.cnblogs.com/lanqingzhou/p/8058571.html
Inspecting Data with Head and Tail
- 在ensembl里没有找到.bed文件注释,在UCSC中下载了mouse的bed文件
$ head GRCm38.mm10.bed
chr1 134199214 134234856 NM_001291928.1 0 - 13420295 0 134234733 0 2 4376,194, 0,35448,
chr1 134199214 134235457 NM_001008533.3 0 - 13420295 0 134234355 0 2 4376,1443, 0,34800,
chr1 134199214 134235457 NM_001282945.1 0 - 13420295 0 134234355 0 3 4376,432,230, 0,34800,36013,
chr1 134199214 134235457 NM_001039510.2 0 - 13420295 0 134234355 0 3 4376,398,230, 0,34800,36013,
chr1 134199214 134235457 NM_001291930.1 0 - 13420295 0 134203505 0 2 4376,230, 0,36013,
chr1 134199218 134235052 XM_006529079.2 0 - 13420295