https://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/
Using in your projects
As any library in Java all you need to do to use sphinx4 is to add jars into dependencies of your project and then you can write code using the API.
Many IDEs like Eclipse or Netbeans or Idea have support for Gradle either with plugin or with built-in features. In that case you can just include sphinx4 libraries into your project with the help of IDE. Please check the relevant part of your IDE documentation, for example IDEA documentation on Gradle.
Basic Usage
To quickly start with sphinx4, create a java project as described above, add required dependencies and type the following simple code:
package com.example; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import edu.cmu.sphinx.api.Configuration; import edu.cmu.sphinx.api.SpeechResult; import edu.cmu.sphinx.api.StreamSpeechRecognizer; public class TranscriberDemo { public static void main(String[] args) throws Exception { Configuration configuration = new Configuration(); configuration .setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us"); configuration .setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict"); configuration .setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin"); StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer( configuration); InputStream stream = new FileInputStream(new File("test.wav"))) recognizer.startRecognition(stream); SpeechResult result; while ((result = recognizer.getResult()) != null) { System.out.format("Hypothesis: %s\n", result.getHypothesis()); } recognizer.stopRecognition(); } }First three attributes are setup using Configuration object which is passed then to a recognizer. The way to point out to the speech source depends on a concrete recognizer and usually is passed as a method parameter.
Configuration
Configuration is used to supply required and optional attributes to recognizer.
Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
LiveSpeechRecognizer
LiveSpeechRecognizer uses microphone as the speech source.
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();
// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();
StreamSpeechRecognizer
StreamSpeechRecognizer uses InputStream as the speech source, you can pass the data from the file this way, you can pass the data from the network socket or from existing byte array.
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("speech.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
Please note that the audio for this decoding must have one of the two specific format:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
or
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Decoder does not support other formats. If audio format does not match, you will not get any results. You need to convert audio to a proper format before decoding. If you want to decode telephone quality audio with the sample rate 8000 Hz, you also need to call
configuration.setSampleRate(8000);
You can retreive multiple results until the file end:
while ((result = recognizer.getResult()) != null) { System.out.println(result.getHypothesis()); }
SpeechAligner
SpeechAligner time-aligns text with audio speech.
SpeechAligner aligner = new SpeechAligner(configuration);
recognizer.align(new URL("101-42.wav"), "one oh one four two");
SpeechResult
SpeechResult provides access to various parts of the recognition result, such as recognized utterance, list of words with time stamps, recognition lattice and so forth.
// Print utterance string without filler words.
System.out.println(result.getHypothesis());
// Get individual words and their times.
for (WordResult r : result.getWords()) {
System.out.println(r);
}
// Save lattice in a graphviz format.
result.getLattice().dumpDot("lattice.dot", "lattice");