Bioinformatics Data Skills by Oreilly学习笔记-9

Chapter9 Working with Range Data

A Crash Course in Genomic Ranges and Coordinate Systems

CrossMap is a command-line tool that converts many data formats (BED, GFF/ GTF, SAM/BAM, Wiggle, VCF) between coordinate systems of different assembly versions.
NCBI Genome Remapping Service is a web-based tool supporting a variety of genomes and formats.
LiftOver is also a web-based tool for converting between genomes hosted on the UCSC Genome Browser’s site.

0-based coordinate system, with half-closed, half-open intervals.
1-based coordinate system, with closed intervals.
在这里插入图片描述

An Interactive Introduction to Range Data with GenomicRanges
Installing and Working with Bioconductor Packages

Bioconductor is an open source software project that creates R bioinformatics packages and serves as a repository for them
GenomicRanges
Used to represent and work with genomic ranges
GenomicFeatures
Used to represent and work with ranges that represent gene models and other features of a genome (genes, exons, UTRs, transcripts, etc.)
Biostrings and BSgenome
Used for manipulating genomic sequence data in R (we’ll cover the subset of these packages used for extracting sequences from ranges)
rtracklayer
Used for reading in common bioinformatics formats like BED, GTF/GFF, and WIG

1. biocLite():Installing Bioconductor packages
Install Bioconductor’s primary packages: (be sure your R version is up to date first)

> source("http://bioconductor.org/biocLite.R")
> biocLite()

2. Install the GenomicRanges package

> biocLite("GenomicRanges")

Load the BiocInstaller package with library(BiocInstaller) first. biocLite() will notify you when some of your packages are out of date and need to be upgraded (which it can do automatically for you)
If you run into an unexpected error with a Bioconductor package, it’s a good idea to run biocUpdatePackages() and biocValid() before debugging.

See the GenomicRanges reference manual and vignettes

Storing Generic Ranges with IRanges
> rng <- IRanges(start=4, end=13)
> rng
IRanges of length 1
start end width
[1] 4 13 10

The most important fact to note: IRanges (and GenomicRanges) is 1-based, and uses closed intervals. The 1-based system was adopted to be consistent with R’s 1-based system (recall the first element in an R vector has index 1).

> IRanges(start=4, width=3)
IRanges of length 1
start end width
[1] 4 6 3
> IRanges(end=5, width=5)
IRanges of length 1
start end width
[1] 1 5 5

An IRanges object containing many ranges:

> x <- IRanges(start=c(4, 7, 2, 20), end=c(13, 7, 5, 23))
> x
IRanges of length 4
start end width
[1] 4 13 10
[2] 7 7 1
[3] 2 5 4
[4] 20 23 4

Each range can be given a name

> names(x) <- letters[1:4]
> x
IRanges of length 4
start end width names
[1] 4 13 10 a
[2] 7 7 1 b
[3] 2 5 4 c
[4] 20 23 4 d

Chapter9用到R的内容太多,先跳过。。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
This practical book teaches the skills that scientists need for turning large sequencing datasets into reproducible and robust biological findings. Many biologists begin their bioinformatics training by learning scripting languages like Python and R alongside the Unix command line. But there's a huge gap between knowing a few programming languages and being prepared to analyze large amounts of biological data. Rather than teach bioinformatics as a set of workflows that are likely to change with this rapidly evolving field, this book demsonstrates the practice of bioinformatics through data skills. Rigorous assessment of data quality and of the effectiveness of tools is the foundation of reproducible and robust bioinformatics analysis. Through open source and freely available tools, you'll learn not only how to do bioinformatics, but how to approach problems as a bioinformatician. Go from handling small problems with messy scripts to tackling large problems with clever methods and tools Focus on high-throughput (or "next generation") sequencing data Learn data analysis with modern methods, versus covering older theoretical concepts Understand how to choose and implement the best tool for the job Delve into methods that lead to easier, more reproducible, and robust bioinformatics analysis Table of Contents Part I. Ideology: Data Skills for Robust and Reproducible Bioinformatics Chapter 1. How to Learn Bioinformatics Part II. Prerequisites: Essential Skills for Getting Started with a Bioinformatics Project Chapter 2. Setting Up and Managing a Bioinformatics Project Chapter 3. Remedial Unix Shell Chapter 4. Working with Remote Machines Chapter 5. Git for Scientists Chapter 6. Bioinformatics Data Part III. Practice: Bioinformatics Data Skills Chapter 7. Unix Data Tools Chapter 8. A Rapid Introduction to the R Language Chapter 9. Working with Range Data Chapter 10. Working with Sequence Data Chapter 11. Working with Alignment Data Chapter 12. Bioinformatics Shell Scripting, Writing Pipelines, and Parallelizing Tasks Chapter 13. Out-of-Memory Approaches: Tabix and SQLite Chapter 14. Conclusion

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值