https://code.google.com/p/stochhmm/
StochHMM is a free, easy to use, open source C++ Library and application that implements HMM from simple text files. It implements traditional HMM algorithms in addition it providing additional flexibility. The additional flexibility is achieved by allowing researchers to integrate additional data sources and application into the HMM framework.
StochHMM implements standard HMM, (Preliminary) HMM with duration. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the ability to explore and rank sub-optimal predictions. We are providing StochHMM as a standalone application and C++ library to give researchers the ability to rapidly develop HMMs.
Integrating Data
Here are a few of the ways that StochHMM allows the users to integrate additional data sources:
- Multiple Emission States
- Weighting or Explicitly Defining State paths on a sequence
- Linking States Emissions/Transitions to external user-defined functions
Multiple Emission States
StochHMM allows the user to provide multiple sequences. These sequences are then handled by the emissions. These sequences can be REAL numbers or discrete characters/words. StochHMM allows each state to have many emissions (Discrete or Continuous). Discrete emissions can be independent of each other or joint distributions. The continuous emissions can be considered in multiple ways. 1) They can be considered as raw probabilities which will be integrated without transformation. 2) They can be considered as values to be plugged into a Univariate Probability Distribution Function or Multivariate PDF (In the case of multiple REAL sequences.
Each states emissions are user-defined, so one state may have emissions from two different sequences, while another may only have a single emission from a single sequence.
Weighting or Explicitly Defining State paths to follow on a sequence.
Often, we have some prior knowledge about the sequence. If this is the case, we may want to integrate that into the model, without redesigning or retraining the model (a timely endeavor). StochHMM allows the user to explicitly define a State path (By name of state, or category of state). In addition, StochHMM also allows the user to weight a states path (By name of State or category of state defined by user) This allows the user to restrict the predicted path or weight their prior knowledge.
Linking States Emissions or Transitions to external user-defined functions
When that transition/emission is evaluated the function is called and can provide an emission. While this may provide one way of addressing a weakness of HMMs, which is that they do not handle long range dependencies. We see it rather as a way to link together existing utilities or functions that provide additional information to the decoding algorithms. In this way, we can link divergent datasets or functions within the HMM trellis in order to arrive at a better prediction.
Features
Brief list of features implemented in StochHMM:
- General settings within Hidden Markov Models
- User-defined HMM model via simple human readable text file
- User-defined Alphabet
- User-defined Ambiguous Characters
- States
- Emissions
- Multiple emission states (Discrete / Continuous)
- Independent (Single or Multiple Discrete)
- Joint Distribution (Multiple Discrete)
- Univariate PDF (Single Sequence - Continuous)
- Multivariate PDF (Multiple Sequence - Continuous)
- Linkable to user-defined function
- Multiple emission states (Discrete / Continuous)
- Transitions
- Standard Transitions
- Lexical Transitions (Single or multiple emission)
- (Preliminary) Explicit Duration Transitions
- Linkable to user-defined functions
- Emissions
- Decoding
- Traditional Decoding Algorithms
- Forward/Backward/Posterior
- Viterbi
- N-best Viterbi
- Stochastic Sampling Decoding Algorithms
- Stochastic Forward
- Stochastic Viterbi
- Stochastic Posterior
- Traditional Decoding Algorithms
- Decoding Traceback Path output formats
- State Path Index
- State Path Label
- GFF
- Hit Table (Stochastic Algorithms)
Developed by:
Korf Lab Genome Center, University of California, Davis
For suggestions or support:
- korflab AT ucdavis DOT edu
- KorfLab Github
- Google Groups
References
1. Schroeder, D.I., Blair J.D., Lott P., Yu H.O., Hong D., Crary F., Ashwood P., Walker C. , Korf I., Robinson W.P., LaSalle J.M. The human placenta methylome. PNAS 15:6037-6042 (2013)
2. Lott, P., Dunaway, K., Yu, K., Korf, I. StochHMM: A Flexible Hidden Markov Model Framework for Rapid Development of HMMs. Poster presented at: Genome Informatics, 2012 Sep 6-9, Cambridge, UK.
3. Schroeder, D. I., Lott, P., Korf, I., LaSalle, J. M. Large-scale methylation domains mark a functional subset of neuronally expressed genes. Genome Res 21, 1583–1591 (2011).
4. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I., Chédin, F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell 45, 814–825 (2012).
Documentation
Code Documentation can be found at http://korflab.github.io/StochHMM
Model file documentation and additional support can found at https://github.com/KorfLab/StochHMM/wiki
StochHMM is provided as free open source code and compiles on Windows, Mac OSX, and Linux. We are providing StochHMM under the MIT open source license to increase accessibility and to give researchers the ability to use it in derivative works without restrictions.
Please feel free to contact us with Bugs, Suggestions, or Questions. lottpaul@gmail.com