Bioinformatics Data Skills by Oreilly学习笔记-9

最新推荐文章于 2024-05-20 09:36:30 发布

weixin_42953727

最新推荐文章于 2024-05-20 09:36:30 发布

阅读量193

点赞数

分类专栏： bioinformatics 文章标签： Bioinformatics

本文链接：https://blog.csdn.net/weixin_42953727/article/details/100184953

版权

bioinformatics 专栏收录该内容

27 篇文章 5 订阅

订阅专栏

Chapter9 Working with Range Data

A Crash Course in Genomic Ranges and Coordinate Systems

CrossMap is a command-line tool that converts many data formats (BED, GFF/ GTF, SAM/BAM, Wiggle, VCF) between coordinate systems of different assembly versions.
NCBI Genome Remapping Service is a web-based tool supporting a variety of genomes and formats.
LiftOver is also a web-based tool for converting between genomes hosted on the UCSC Genome Browser’s site.

• 0-based coordinate system, with half-closed, half-open intervals.
• 1-based coordinate system, with closed intervals.
在这里插入图片描述

An Interactive Introduction to Range Data with GenomicRanges

Installing and Working with Bioconductor Packages

Bioconductor is an open source software project that creates R bioinformatics packages and serves as a repository for them
GenomicRanges
Used to represent and work with genomic ranges
GenomicFeatures
Used to represent and work with ranges that represent gene models and other features of a genome (genes, exons, UTRs, transcripts, etc.)
Biostrings and BSgenome
Used for manipulating genomic sequence data in R (we’ll cover the subset of these packages used for extracting sequences from ranges)
rtracklayer
Used for reading in common bioinformatics formats like BED, GTF/GFF, and WIG

1. biocLite()：Installing Bioconductor packages
Install Bioconductor’s primary packages: (be sure your R version is up to date first)

> source("http://bioconductor.org/biocLite.R")
> biocLite()

2. Install the GenomicRanges package

> biocLite("GenomicRanges")

Load the BiocInstaller package with library(BiocInstaller) first. biocLite() will notify you when some of your packages are out of date and need to be upgraded (which it can do automatically for you)
If you run into an unexpected error with a Bioconductor package, it’s a good idea to run biocUpdatePackages() and biocValid() before debugging.

See the GenomicRanges reference manual and vignettes

Storing Generic Ranges with IRanges

> rng <- IRanges(start=4, end=13)
> rng
IRanges of length 1
start end width
[1] 4 13 10

The most important fact to note: IRanges (and GenomicRanges) is 1-based, and uses closed intervals. The 1-based system was adopted to be consistent with R’s 1-based system (recall the first element in an R vector has index 1).

> IRanges(start=4, width=3)
IRanges of length 1
start end width
[1] 4 6 3
> IRanges(end=5, width=5)
IRanges of length 1
start end width
[1] 1 5 5

An IRanges object containing many ranges:

> x <- IRanges(start=c(4, 7, 2, 20), end=c(13, 7, 5, 23))
> x
IRanges of length 4
start end width
[1] 4 13 10
[2] 7 7 1
[3] 2 5 4
[4] 20 23 4

Each range can be given a name

> names(x) <- letters[1:4]
> x
IRanges of length 4
start end width names
[1] 4 13 10 a
[2] 7 7 1 b
[3] 2 5 4 c
[4] 20 23 4 d

Chapter9用到R的内容太多，先跳过。。

weixin_42953727

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Bioinformatics Data Skills by Oreilly学习笔记-9

Chapter9 Working with Range DataA Crash Course in Genomic Ranges and Coordinate SystemsCrossMap is a command-line tool that converts many data formats (BED, GFF/ GTF, SAM/BAM, Wiggle, VCF) between c...
复制链接

扫一扫