Advance articles alert 09 July 2019

最新推荐文章于 2019-07-27 12:58:01 发布

whereis redhat

最新推荐文章于 2019-07-27 12:58:01 发布

阅读量196

点赞数

1. Cooler: scalable storage for Hi-C data and other genomically-labeled arrays

Nezar Abdennur Leonid Mirny

Bioinformatics, btz540, https://doi.org/10.1093/bioinformatics/btz540

Published:10 July 2019

Abstract

Motivation

Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form（稠密形态/致密形态）. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature（稀疏性）, while supporting efficient compression and providing fast random access to facilitate development（促进发展） of scalable algorithms for data analysis.

Results

We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices（矩阵） at any resolution（分辨率/解析度）. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata（元数据/诠释数据）. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium.

Availability

Cooler is cross-platform, BSD-licensed, and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler.

Supplementary information

Supplementary data are available at Bioinformatics online.

2. MemBlob database and server for identifying transmembrane regions using cryo-EM maps

Bianka Farkas Georgina Csizmadia Eszter Katona Gábor E TusnádyTamás Hegedűs

Bioinformatics, btz539, https://doi.org/10.1093/bioinformatics/btz539

Published:10 July 2019

Abstract

The identification of transmembrane helices in transmembrane proteins is crucial, not only to understand their mechanism of action, but also to develop new therapies. While experimental data on the boundaries of membrane-embedded regions is sparse（稀少的）, this information is present in cryo-electron microscopy (cryo-EM) density maps and it has not been utilized yet for determining membrane regions. We developed a computational pipeline, where the inputs of a cryo-EM map, the corresponding atomistic structure（原子结构）, and the potential bilayer orientation（方向/定向/定位） determined by TMDET algorithm of a given protein result in an output defining the residues assigned to the bulk water phase, lipid interface, and the lipid hydrophobic core. Based on this method, we built a database involving published cryo-EM protein structures and a server to be able to compute this data for newly obtained structures.

Availability

http://memblob.hegelab.org

Supplementary information

Supplementary data are available at Bioinformatics online.

3. Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Robert S Harris Monika Cechova Kateryna D Makova

Bioinformatics, btz484, https://doi.org/10.1093/bioinformatics/btz484

Published:10 July 2019

Abstract

Summary

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered（解释/辨明） due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative（假定的/推定的） tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations（模拟/仿真）, we validated（验证/确认） the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

Availability and Implementation

NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder.

Supplementary information

Supplementary data are available at Bioinformatics online.