Bioinformatics Data Skills by Oreilly学习笔记-7-1

最新推荐文章于 2024-05-20 09:36:30 发布

weixin_42953727

最新推荐文章于 2024-05-20 09:36:30 发布

阅读量372

点赞数

分类专栏： bioinformatics 文章标签： Bioinformatics

本文链接：https://blog.csdn.net/weixin_42953727/article/details/100084710

版权

本文介绍了在生物信息学中如何使用Unix工具处理文本数据，重点关注BED和GTF文件。讨论了inspect数据的head和tail命令，以及wc、ls和awk用于获取文本数据摘要信息的方法。此外，还探讨了cut和column命令在处理列数据时的应用。

摘要由CSDN通过智能技术生成

PART III Practice: Bioinformatics Data Skills

Chapter7 Unix Data Tools

Inspecting and Manipulating Text Data with Unix Tools

In this chapter, we’ll work with very simple genomic feature formats: BED (three column) and GTF files.
基因组数据注释常用的文件——Bed文件和GFF文件简介：http://blog.sina.com.cn/s/blog_80572f5d0102x5m7.html
bed文件格式解读：
http://www.mamicode.com/info-detail-2417885.html
转录本的bed文件下载：
https://www.jianshu.com/p/298fe5bd6d45
生物信息学常见的数据下载，包括基因组，gtf，bed，注释
http://www.biotrainee.com/thread-857-1-1.html
ensembl和NCBI基因组下载，基因序列下载查看：
http://www.omicsclass.com/article/58
生信技能树：
http://www.biotrainee.com/forum.php
bedtools 使用小结：使用BedTools计算两基因组的overlap区域
https://www.plob.org/article/10690.html
BED文件以及如何正确的从UCSC下载BED文件：
https://www.jianshu.com/p/2c6b8fb03c58
Ensembl与NCBI Map Viewer和UCSC的区别：
https://www.cnblogs.com/RyannBio/p/9561216.html
NGS基础 - 参考基因组和基因注释文件：
https://mp.weixin.qq.com/s/2OoXy4f1t0hE8OUqsAt1kw
使用wget -O result.txt 'http://www.ensembl.org/biomart/martservice?query= + XML中的内容 (调整为一行，并且行尾加一个单引号)即可反复使用。如果想换一个物种，只需修改对应的Dataset name即可。
http://www.ensembl.org/biomart/martview/a56aa65813382caa8cf3218721f16a0f
在这里插入图片描述
批量解压缩：
https://www.cnblogs.com/lansor/archive/2012/07/03/2574214.html
压缩/解压各种类型文件：
http://ask.zol.com.cn/x/4228195.html
linux的压缩解压命令全解：
https://www.cnblogs.com/lanqingzhou/p/8058571.html

Inspecting Data with Head and Tail

在ensembl里没有找到.bed文件注释，在UCSC中下载了mouse的bed文件

$ head GRCm38.mm10.bed
chr1    134199214       134234856       NM_001291928.1  0       -       13420295                                                                                                                     0       134234733       0       2       4376,194,       0,35448,
chr1    134199214       134235457       NM_001008533.3  0       -       13420295                                                                                                                     0       134234355       0       2       4376,1443,      0,34800,
chr1    134199214       134235457       NM_001282945.1  0       -       13420295                                                                                                                     0       134234355       0       3       4376,432,230,   0,34800,36013,
chr1    134199214       134235457       NM_001039510.2  0       -       13420295                                                                                                                     0       134234355       0       3       4376,398,230,   0,34800,36013,
chr1    134199214       134235457       NM_001291930.1  0       -       13420295                                                                                                                     0       134203505       0       2       4376,230,       0,36013,
chr1    134199218       134235052       XM_006529079.2  0       -       13420295