【】1. diSTruct v1.0: Generating Biomolecular Structures from Distance Constraints
Oskar Taubert, Ines Reinartz, Henning Meyerhenke, Alexander Schug
Bioinformatics, btz578, https://doi.org/10.1093/bioinformatics/btz578
Published:22 July 2019
Abstract
Summary
The distance geometry(几何) problem is often encountered in molecular biology and the life sciences at large, as a host of experimental methods produce ambiguous and noisy distance data. In this note, we present diSTruct; an adaptation of the generic MaxEnt-Stress graph drawing algorithm to the domain of biological macromolecules(生物大分子). diSTruct is fast, provides reliable structural models even from incomplete or noisy distance data and integrates access to graph analysis tools.
Availability and Implementation
diSTruct is written in C ++, Cython and Python 3. It is available from https://github.com/KIT-MBS/distruct.git or in the Python package index under the MIT license.
Supplementary information
Supplementary data is available at Bioinformatics online.
2.How sequence alignment scores correspond to probability models
Martin C Frith Bioinformatics, btz576, https://doi.org/10.1093/bioinformatics/btz576
Published:22 July 2019
Abstract
Motivation
Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions, and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts, and measure sequence similarity integrated over possible alignments.
Results
This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a “temperature” parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias towards either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment.
Supplementary information
Supplementary data are available at Bioinformatics online.
【】3.Genetic association testing using the GENESIS R/Bioconductor package
Stephanie M Gogarten, Tamar Sofer, Han Chen, Chaoyu Yu, Jennifer A Brody,Timothy A Thornton, Kenneth M Rice, Matthew P Conomos
Bioinformatics, btz567, https://doi.org/10.1093/bioinformatics/btz567
Published:22 July 2019
Abstract
Summary
The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components, and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment.
Availability and Implementation
https://bioconductor.org/packages/GENESIS; vignettes included.
Supplementary Information
Supplementary tables and figures are available at Bioinformatics online.
4。
【】Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data
Chan Wang, Jiyuan Hu, Martin J Blaser, Huilin Li
Bioinformatics, btz565, https://doi.org/10.1093/bioinformatics/btz56
22 July 2019
Abstract
Motivation
Recent microbiome association studies (微生物关联研究)have revealed important associations between microbiome and disease/health status. Such findings encourage scientists to dive deeper to uncover the causal role (因果关系)of microbiome in the underlying biological mechanism, and have led to applying statistical models to quantify causal microbiome effects and to identify the specific microbial agents. However, there are no existing causal mediation methods specifically designed to handle high dimensional and compositional microbiome data.
Results
We propose a rigorous Sparse Microbial Causal Mediation Model (SparseMCMM) specifically designed for the high dimensional and compositional microbiome data in a typical three-factor (treatment, microbiome and outcome) causal study design. In particular, linear log-contrast regression model and Dirichlet regression model are proposed to estimate the causal direct effect of treatment and the causal mediation effects of microbiome at both the community and individual taxon levels. Regularization techniques are used to perform the variable selection in the proposed model framework to identify signature causal microbes. Two hypothesis tests on the overall mediation effect are proposed and their statistical significance is estimated by permutation procedures. Extensive simulated scenarios show that SparseMCMM has excellent performance in estimation and hypothesis testing. Finally, we showcase the utility of the proposed SparseMCMM method in a study which the murine microbiome has been manipulated by providing a clear and sensible causal path among antibiotic treatment, microbiome composition and mouse weight.
Availability
https://sites.google.com/site/huilinli09/software and https://github.com/chanw0/SparseMCMM.
Supplementary information
Supplementary data are available at Bioinformatics online.
5. GSMA: an approach to identify robust global and test Gene Signatures using Meta-Analysis
Adib Shafi, Tin Nguyen, Azam Peyvandipour, Sorin Draghici
Bioinformatics, btz561, https://doi.org/10.1093/bioinformatics/btz561
Published:22 July 2019
Abstract
Motivation
Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher’s method, Stouffer’s method, minP and maxP, have at least two major limitations: i) they are sensitive to outliers, and ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.
Results
Here we propose GSMA, an intra- and inter-level meta-analysis framework that overcomes these limitations and provides a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signaturethat can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer’s disease using 9 data sets including 1,108 individuals. These signatures are then validated on 12 independent data sets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets.
Availability
For the review purpose, source code is currently available at https://bit.ly/2AXg3qS. It will be available as a package in Bioconductor soon.
Supplementary information
Supplementary data are available at Bioinformatics online.