Bioinformatics Data Skills by Oreilly学习笔记-7-1

本文介绍了在生物信息学中如何使用Unix工具处理文本数据,重点关注BED和GTF文件。讨论了inspect数据的head和tail命令,以及wc、ls和awk用于获取文本数据摘要信息的方法。此外,还探讨了cut和column命令在处理列数据时的应用。
摘要由CSDN通过智能技术生成

PART III Practice: Bioinformatics Data Skills

Chapter7 Unix Data Tools

Inspecting and Manipulating Text Data with Unix Tools

In this chapter, we’ll work with very simple genomic feature formats: BED (three column) and GTF files.
基因组数据注释常用的文件——Bed文件和GFF文件简介:http://blog.sina.com.cn/s/blog_80572f5d0102x5m7.html
bed文件格式解读:
http://www.mamicode.com/info-detail-2417885.html
转录本的bed文件下载:
https://www.jianshu.com/p/298fe5bd6d45
生物信息学常见的数据下载,包括基因组,gtf,bed,注释
http://www.biotrainee.com/thread-857-1-1.html
ensembl和NCBI基因组下载,基因序列下载查看:
http://www.omicsclass.com/article/58
生信技能树:
http://www.biotrainee.com/forum.php
bedtools 使用小结:使用BedTools计算两基因组的overlap区域
https://www.plob.org/article/10690.html
BED文件以及如何正确的从UCSC下载BED文件:
https://www.jianshu.com/p/2c6b8fb03c58
Ensembl与NCBI Map Viewer和UCSC的区别:
https://www.cnblogs.com/RyannBio/p/9561216.html
NGS基础 - 参考基因组和基因注释文件:
https://mp.weixin.qq.com/s/2OoXy4f1t0hE8OUqsAt1kw
使用wget -O result.txt 'http://www.ensembl.org/biomart/martservice?query= + XML中的内容 (调整为一行,并且行尾加一个单引号)即可反复使用。如果想换一个物种,只需修改对应的Dataset name即可。
http://www.ensembl.org/biomart/martview/a56aa65813382caa8cf3218721f16a0f
在这里插入图片描述
批量解压缩:
https://www.cnblogs.com/lansor/archive/2012/07/03/2574214.html
压缩/解压各种类型文件:
http://ask.zol.com.cn/x/4228195.html
linux的压缩解压命令全解:
https://www.cnblogs.com/lanqingzhou/p/8058571.html

Inspecting Data with Head and Tail
  1. 在ensembl里没有找到.bed文件注释,在UCSC中下载了mouse的bed文件
$ head GRCm38.mm10.bed
chr1    134199214       134234856       NM_001291928.1  0       -       13420295                                                                                                                     0       134234733       0       2       4376,194,       0,35448,
chr1    134199214       134235457       NM_001008533.3  0       -       13420295                                                                                                                     0       134234355       0       2       4376,1443,      0,34800,
chr1    134199214       134235457       NM_001282945.1  0       -       13420295                                                                                                                     0       134234355       0       3       4376,432,230,   0,34800,36013,
chr1    134199214       134235457       NM_001039510.2  0       -       13420295                                                                                                                     0       134234355       0       3       4376,398,230,   0,34800,36013,
chr1    134199214       134235457       NM_001291930.1  0       -       13420295                                                                                                                     0       134203505       0       2       4376,230,       0,36013,
chr1    134199218       134235052       XM_006529079.2  0       -       13420295           
This practical book teaches the skills that scientists need for turning large sequencing datasets into reproducible and robust biological findings. Many biologists begin their bioinformatics training by learning scripting languages like Python and R alongside the Unix command line. But there's a huge gap between knowing a few programming languages and being prepared to analyze large amounts of biological data. Rather than teach bioinformatics as a set of workflows that are likely to change with this rapidly evolving field, this book demsonstrates the practice of bioinformatics through data skills. Rigorous assessment of data quality and of the effectiveness of tools is the foundation of reproducible and robust bioinformatics analysis. Through open source and freely available tools, you'll learn not only how to do bioinformatics, but how to approach problems as a bioinformatician. Go from handling small problems with messy scripts to tackling large problems with clever methods and tools Focus on high-throughput (or "next generation") sequencing data Learn data analysis with modern methods, versus covering older theoretical concepts Understand how to choose and implement the best tool for the job Delve into methods that lead to easier, more reproducible, and robust bioinformatics analysis Table of Contents Part I. Ideology: Data Skills for Robust and Reproducible Bioinformatics Chapter 1. How to Learn Bioinformatics Part II. Prerequisites: Essential Skills for Getting Started with a Bioinformatics Project Chapter 2. Setting Up and Managing a Bioinformatics Project Chapter 3. Remedial Unix Shell Chapter 4. Working with Remote Machines Chapter 5. Git for Scientists Chapter 6. Bioinformatics Data Part III. Practice: Bioinformatics Data Skills Chapter 7. Unix Data Tools Chapter 8. A Rapid Introduction to the R Language Chapter 9. Working with Range Data Chapter 10. Working with Sequence Data Chapter 11. Working with Alignment Data Chapter 12. Bioinformatics Shell Scripting, Writing Pipelines, and Parallelizing Tasks Chapter 13. Out-of-Memory Approaches: Tabix and SQLite Chapter 14. Conclusion
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值