Advance articles alert 09 July 2019

 

1. Cooler: scalable storage for Hi-C data and other genomically-labeled arrays

Nezar Abdennur Leonid Mirny

Bioinformatics, btz540, https://doi.org/10.1093/bioinformatics/btz540

Published:10 July 2019

Abstract

Motivation

Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form(稠密形态/致密形态). Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature(稀疏性), while supporting efficient compression and providing fast random access to facilitate development(促进发展) of scalable algorithms for data analysis.

Results

We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices(矩阵) at any resolution(分辨率/解析度). It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata(元数据/诠释数据). Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium.

Availability

Cooler is cross-platform, BSD-licensed, and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler.

Supplementary information

Supplementary data are available at Bioinformatics online.

 

 

2. MemBlob database and server for identifying transmembrane regions using cryo-EM maps

Bianka Farkas Georgina Csizmadia Eszter Katona Gábor E TusnádyTamás Hegedűs

Bioinformatics, btz539, https://doi.org/10.1093/bioinformatics/btz539

Published:10 July 2019

Abstract

The identification of transmembrane helices in transmembrane proteins is crucial, not only to understand their mechanism of action, but also to develop new therapies. While experimental data on the boundaries of membrane-embedded regions is sparse(稀少的), this information is present in cryo-electron microscopy (cryo-EM) density maps and it has not been utilized yet for determining membrane regions. We developed a computational pipeline, where the inputs of a cryo-EM map, the corresponding atomistic structure(原子结构), and the potential bilayer orientation(方向/定向/定位) determined by TMDET algorithm of a given protein result in an output defining the residues assigned to the bulk water phase, lipid interface, and the lipid hydrophobic core. Based on this method, we built a database involving published cryo-EM protein structures and a server to be able to compute this data for newly obtained structures.

Availability

http://memblob.hegelab.org

Supplementary information

Supplementary data are available at Bioinformatics online.

 

3. Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Robert S Harris Monika Cechova Kateryna D Makova

Bioinformatics, btz484, https://doi.org/10.1093/bioinformatics/btz484

Published:10 July 2019

Abstract

Summary

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered(解释/辨明) due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative(假定的/推定的) tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations(模拟/仿真), we validated(验证/确认) the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

Availability and Implementation

NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder.

Supplementary information

Supplementary data are available at Bioinformatics online.

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值