Stochhmm:一个集成多种方法的隐马尔科夫C++项目

https://code.google.com/p/stochhmm/


StochHMM is a free, easy to use, open source C++ Library and application that implements HMM from simple text files. It implements traditional HMM algorithms in addition it providing additional flexibility. The additional flexibility is achieved by allowing researchers to integrate additional data sources and application into the HMM framework.

StochHMM implements standard HMM, (Preliminary) HMM with duration. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the ability to explore and rank sub-optimal predictions. We are providing StochHMM as a standalone application and C++ library to give researchers the ability to rapidly develop HMMs.

Integrating Data

Here are a few of the ways that StochHMM allows the users to integrate additional data sources:
  1. Multiple Emission States
  2. Weighting or Explicitly Defining State paths on a sequence
  3. Linking States Emissions/Transitions to external user-defined functions

Multiple Emission States

StochHMM allows the user to provide multiple sequences. These sequences are then handled by the emissions. These sequences can be REAL numbers or discrete characters/words. StochHMM allows each state to have many emissions (Discrete or Continuous). Discrete emissions can be independent of each other or joint distributions. The continuous emissions can be considered in multiple ways. 1) They can be considered as raw probabilities which will be integrated without transformation. 2) They can be considered as values to be plugged into a Univariate Probability Distribution Function or Multivariate PDF (In the case of multiple REAL sequences.
Each states emissions are user-defined, so one state may have emissions from two different sequences, while another may only have a single emission from a single sequence.

Weighting or Explicitly Defining State paths to follow on a sequence.

Often, we have some prior knowledge about the sequence. If this is the case, we may want to integrate that into the model, without redesigning or retraining the model (a timely endeavor). StochHMM allows the user to explicitly define a State path (By name of state, or category of state). In addition, StochHMM also allows the user to weight a states path (By name of State or category of state defined by user) This allows the user to restrict the predicted path or weight their prior knowledge.

Linking States Emissions or Transitions to external user-defined functions

When that transition/emission is evaluated the function is called and can provide an emission. While this may provide one way of addressing a weakness of HMMs, which is that they do not handle long range dependencies. We see it rather as a way to link together existing utilities or functions that provide additional information to the decoding algorithms. In this way, we can link divergent datasets or functions within the HMM trellis in order to arrive at a better prediction.


Features

Brief list of features implemented in StochHMM:
  • General settings within Hidden Markov Models
    1. User-defined HMM model via simple human readable text file
    2. User-defined Alphabet
    3. User-defined Ambiguous Characters
  • States
    1. Emissions
      • Multiple emission states (Discrete / Continuous)
        • Independent (Single or Multiple Discrete)
        • Joint Distribution (Multiple Discrete)
        • Univariate PDF (Single Sequence - Continuous)
        • Multivariate PDF (Multiple Sequence - Continuous)
      • Linkable to user-defined function
    2. Transitions
      • Standard Transitions
      • Lexical Transitions (Single or multiple emission)
      • (Preliminary) Explicit Duration Transitions
      • Linkable to user-defined functions
  • Decoding
    1. Traditional Decoding Algorithms
      • Forward/Backward/Posterior
      • Viterbi
      • N-best Viterbi
    2. Stochastic Sampling Decoding Algorithms
      • Stochastic Forward
      • Stochastic Viterbi
      • Stochastic Posterior
  • Decoding Traceback Path output formats
    • State Path Index
    • State Path Label
    • GFF
    • Hit Table (Stochastic Algorithms)


Developed by:

Korf Lab Genome Center, University of California, Davis

For suggestions or support:


References

1. Schroeder, D.I., Blair J.D., Lott P., Yu H.O., Hong D., Crary F., Ashwood P., Walker C. , Korf I., Robinson W.P., LaSalle J.M. The human placenta methylome. PNAS 15:6037-6042 (2013)

2. Lott, P., Dunaway, K., Yu, K., Korf, I. StochHMM: A Flexible Hidden Markov Model Framework for Rapid Development of HMMs. Poster presented at: Genome Informatics, 2012 Sep 6-9, Cambridge, UK.

3. Schroeder, D. I., Lott, P., Korf, I., LaSalle, J. M. Large-scale methylation domains mark a functional subset of neuronally expressed genes. Genome Res 21, 1583–1591 (2011).

4. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I., Chédin, F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell 45, 814–825 (2012).


Documentation

Code Documentation can be found at http://korflab.github.io/StochHMM

Model file documentation and additional support can found at https://github.com/KorfLab/StochHMM/wiki


StochHMM is provided as free open source code and compiles on Windows, Mac OSX, and Linux. We are providing StochHMM under the MIT open source license to increase accessibility and to give researchers the ability to use it in derivative works without restrictions.

Please feel free to contact us with Bugs, Suggestions, or Questions. lottpaul@gmail.com




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值