生物信息开源库python,Seq — 一种用于生物信息学的高性能Python语言

3378a8592e272b9e13bdaae9fb4de62a.png

Seq — a language for bioinformatics

68747470733a2f2f7472617669732d63692e636f6d2f7365712d6c616e672f7365712e7376673f6272616e63683d6d617374657268747470733a2f2f6261646765732e6769747465722e696d2f4a6f696e253230436861742e73766768747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f7365712d6c616e672f7365713f736f72743d73656d76657268747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f7365712d6c616e672f736571

Introduction

A strongly-typed and statically-compiled high-performance Pythonic language!

Seq is a programming language for computational genomics and bioinformatics. With a Python-compatible syntax and a host of domain-specific features and optimizations, Seq makes writing high-performance genomics software as easy as writing Python code, and achieves performance comparable to (and in many cases better than) C/C++.

Think of Seq as a strongly-typed and statically-compiled Python: all the bells and whistles of Python, boosted with strong type system, without any performance overhead.

Seq is able to outperform Python code by up to 160x. Seq can further beat equivalent C/C++ code by up to 2x without any manual interventions, and also natively supports parallelism out of the box. Implementation details and benchmarks are discussed in our paper.

Learn more by following the tutorial or from the cookbook.

Example

Seq is a Python-compatible language, and the vast majority of Python programs should work without any modifications:

def check_prime(n):

if n > 1:

for i in range(2, n):

if n % i == 0:

return False

return True

else:

return False

n = 1009

print n, 'is', 'a' if check_prime(n) else 'not a', 'prime'

Here is an example that showcases Seq's bioinformatics features: a seeding application in Seq using a hypothetical genome index, like what is typically found in seed-and-extend alignment algorithms:

from sys import argv

from genomeindex import *

type K = Kmer[20]

# index and process 20-mers

@prefetch

def process(kmer: K,

index: GenomeIndex[K]):

hits_fwd = index[kmer]

hits_rev = index[~kmer]

...

# index over 20-mers

index = GenomeIndex[K](argv[1])

# stride for k-merization

stride = 10

# sequence-processing pipeline

(FASTQ(argv[2])

|> seqs

|> kmers[K](stride)

|> process(index))

A few notable aspects of this code:

Seq provides native k-mer types, e.g. a 20-mer is represented by Kmer[20] as above.

k-mers can be reverse-complemented with ~.

Seq provides easy iteration over common formats like FASTQ (FASTQ above).

Complex pipelines are easily expressible in Seq (via the |> syntax).

Seq can perform pipeline transformations to make genomic index lookups faster via @prefetch.

For a concrete example of genomeindex, check out our re-implementation of SNAP's index.

Install

Pre-built binaries

Pre-built binaries for Linux and macOS on x86_64 are available alongside each release. We also have a script for downloading and installing pre-built versions:

wget -O - https://raw.githubusercontent.com/seq-lang/seq/master/install.sh | bash

This will install Seq in a new .seq directory within your home directory. Be sure to update ~/.bash_profile as the script indicates afterwards!

Seq binaries require a libomp to be present on your machine. brew install libomp or apt install libomp5 should do the trick.

Build from source

Documentation

Please check seq-lang.org for in-depth documentation.

Citing Seq

If you use Seq in your research, please cite:

Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: a high-performance language for bioinformatics. Proc. ACM Program. Lang. 3, OOPSLA, Article 125 (October 2019), 29 pages. DOI: https://doi.org/10.1145/3360551

BibTeX:

@article{Shajii:2019:SHL:3366395.3360551,

author = {Shajii, Ariya and Numanagi\'{c}, Ibrahim and Baghdadi, Riyadh and Berger, Bonnie and Amarasinghe, Saman},

title = {Seq: A High-performance Language for Bioinformatics},

journal = {Proc. ACM Program. Lang.},

issue_date = {October 2019},

volume = {3},

number = {OOPSLA},

month = oct,

year = {2019},

issn = {2475-1421},

pages = {125:1--125:29},

articleno = {125},

numpages = {29},

url = {http://doi.acm.org/10.1145/3360551},

doi = {10.1145/3360551},

acmid = {3360551},

publisher = {ACM},

address = {New York, NY, USA},

keywords = {Python, bioinformatics, computational biology, domain-specific language, optimization, programming language},

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值