Sphinx-4 Application Programmer's Guide学习

最新推荐文章于 2024-04-28 01:13:49 发布

晏温

最新推荐文章于 2024-04-28 01:13:49 发布

阅读量669

点赞数

https://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/

Using in your projects

As any library in Java all you need to do to use sphinx4 is to add jars into dependencies of your project and then you can write code using the API.

Many IDEs like Eclipse or Netbeans or Idea have support for Gradle either with plugin or with built-in features. In that case you can just include sphinx4 libraries into your project with the help of IDE. Please check the relevant part of your IDE documentation, for example IDEA documentation on Gradle.

Basic Usage

To quickly start with sphinx4, create a java project as described above, add required dependencies and type the following simple code:

package com.example;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;

public class TranscriberDemo {       
                                     
    public static void main(String[] args) throws Exception {
                                     
        Configuration configuration = new Configuration();

        configuration
                .setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        configuration
                .setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
        configuration
                .setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

        StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
                configuration);
        InputStream stream = new FileInputStream(new File("test.wav")))

        recognizer.startRecognition(stream);
        SpeechResult result;
        while ((result = recognizer.getResult()) != null) {
            System.out.format("Hypothesis: %s\n", result.getHypothesis());
        }
        recognizer.stopRecognition();
    }
}

First three attributes are setup using Configuration object which is passed then to a recognizer. The way to point out to the speech source depends on a concrete recognizer and usually is passed as a method parameter.

Configuration

Configuration is used to supply required and optional attributes to recognizer.

Configuration configuration = new Configuration();
 
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

LiveSpeechRecognizer

LiveSpeechRecognizer uses microphone as the speech source.

LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();
// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();

StreamSpeechRecognizer

StreamSpeechRecognizer uses InputStream as the speech source, you can pass the data from the file this way, you can pass the data from the network socket or from existing byte array.

StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("speech.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();

Please note that the audio for this decoding must have one of the two specific format:

    RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

    RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz

Decoder does not support other formats. If audio format does not match, you will not get any results. You need to convert audio to a proper format before decoding. If you want to decode telephone quality audio with the sample rate 8000 Hz, you also need to call

   configuration.setSampleRate(8000);

You can retreive multiple results until the file end:

while ((result = recognizer.getResult()) != null) {
    System.out.println(result.getHypothesis());
}

SpeechAligner

SpeechAligner time-aligns text with audio speech.

SpeechAligner aligner = new SpeechAligner(configuration);
recognizer.align(new URL("101-42.wav"), "one oh one four two");

SpeechResult

SpeechResult provides access to various parts of the recognition result, such as recognized utterance, list of words with time stamps, recognition lattice and so forth.

// Print utterance string without filler words.
System.out.println(result.getHypothesis());
 
// Get individual words and their times.
for (WordResult r : result.getWords()) {
    System.out.println(r);
}
 
// Save lattice in a graphviz format.
result.getLattice().dumpDot("lattice.dot", "lattice");

晏温

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Sphinx-4 Application Programmer's Guide学习

https://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/Using in your projectsAs any library in Java all you need to do to use sphinx4 is to add jars into dependencies of your
复制链接

扫一扫