OpenEars 语音处理Welcome to OpenEars: free speech recognition and speech synthesis for the iPhone

OpenEars   包括离线语音处理等等

http://www.politepix.com/openears/


If you aren't quite ready to read the documentation,  visit the quickstart tool so you can get started with OpenEars in just a few minutes! You can come back and read the  docs or the FAQ once you have specific questions.

Introduction

OpenEars is an shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement round-trip English language speech recognition and text-to-speech on the iPhone and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone or iPad app. It is the most popular offline framework for speech recognition and speech synthesis on iOS and has been featured in development books such as O'Reilly's Basic Sensors in iOS by Alasdair Allan and Cocos2d for iPhone 1 Game Development Cookbook by Nathan Burba.


Highly-accurate large-vocabulary recognition (that is, trying to recognize any word the user speaks out of many thousands of known words) is not yet a reality for local in-app processing on the iPhone given the hardware limitations of the platform; even Siri does its large-vocabulary recognition on the server side. However, Pocketsphinx (the open source voice recognition engine that OpenEars uses) is capable of local recognition on the iPhone of vocabularies with hundreds of words depending on the environment and other factors, and performs very well with command-and-control language models. The best part is that it uses no network connectivity because all processing occurs locally on the device.

The current version of OpenEars is 1.2.4. Download OpenEars 1.2.4or read its changelog.

Features of OpenEars

OpenEars can:

  • Listen continuously for speech on a background thread, while suspending or resuming speech processing on demand, all while using less than 4% CPU on average on an iPhone 4(decoding speech, text-to-speech, updating the UI and other intermittent functions use more CPU),
  • Use any of 9 voices for speech, including male and female voices with a range of speed/quality level, and switch between them on the fly,
  • Change the pitch, speed and variance of any text-to-speech voice,
  • Know whether headphones are plugged in and continue voice recognition during text-to-speech only when they are plugged in,
  • Support bluetooth audio devices (experimental),
  • Dispatch information to any part of your app about the results of speech recognition and speech, or changes in the state of the audio session (such as an incoming phone call or headphones being plugged in),
  • Deliver level metering for both speech input and speech output so you can design visual feedback for both states.
  • Support JSGF grammars,
  • Dynamically generate new ARPA language models in-app based on input from an NSArray of NSStrings,
  • Switch between ARPA language models or JSGF grammars on the fly,
  • Get n-best lists with scoring,
  • Test existing recordings,
  • Be easily interacted with via standard and simple Objective-C methods,
  • Control all audio functions with text-to-speech and speech recognition in memory instead of writing audio files to disk and then reading them,
  • Drive speech recognition with a low-latency Audio Unit driver for highest responsiveness,
  • Be installed in a Cocoa-standard fashion using an easy-peasy already-compiled framework.
  • In addition to its various new features and faster recognition/text-to-speech responsiveness, OpenEars now has improved recognition accuracy.
    • OpenEars is free to use in an iPhone or iPad app.
Warning
Before using OpenEars, please note it has to use a different audio driver on the Simulator that is less accurate, so it is always necessary to evaluate accuracy on a real device. Please don't submit support requests for accuracy issues with the Simulator.


Warning
Because Apple has removed armv6 architecture compiling in Xcode 4.5, and it is only possible to support upcoming devices using the armv7s architecture available in Xcode 4.5, there was no other option than to end support for armv6 devices after OpenEars 1.2. That means that current version of OpenEars only supports armv7 and armv7s devices (iPhone 3GS and later). If your app supports older devices like the first generation iPhone or the iPhone 3G, you can continue to download the legacy edition of OpenEars 1.2  here, but that edition will not update further – all updated versions of OpenEars starting with 1.2.1 will not support armv6 devices, just armv7 and armv7s. If you have previously been supporting older devices and you want to submit an app update removing that support, you must set your minimum deployment target to iOS 4.3 or later, or your app will be rejected by Apple. The framework is 100% compatible with LLVM-using versions of Xcode which precede version 4.5, but your app must be set to not compile the armv6 architecture in order to use it.

Installation

To use OpenEars:

  • Create your own app, and add the iOS frameworks AudioToolbox and AVFoundation to it.
  • Inside your downloaded distribution there is a folder called "Frameworks". Drag the "Frameworks" folder into your app project in Xcode.

OK, now that you've finished laying the groundwork, you have to...wait, that's everything. You're ready to start using OpenEars. Give the sample app a spin to try out the features (the sample app uses ARC so you'll need a recent Xcode version) and then visit the Politepix interactive tutorial generator for a customized tutorial showing you exactly what code to add to your app for all of the different functionality of OpenEars.

If the steps on this page didn't work for you, you can get free support at the forums, read the FAQ, brush up on the documentation, or open aprivate email support incident at the Politepix shop. If you'd like to read the documentation, simply read onward.

Basic concepts

There are a few basic concepts to understand about voice recognition and OpenEars that will make it easiest to create an app.

  • Local or offline speech recognition versus server-based or online speech recognition: most speech recognition on the iPhone is done by streaming the speech audio to servers. OpenEars works by doing the recognition inside the iPhone without using the network. This saves bandwidth and results in faster response, but since a server is much more powerful than a phone it means that we have to work with much smaller vocabularies to get accurate recognition.
  • Language Models. The language model is the vocabulary that you want OpenEars to understand, in a format that its speech recognition engine can understand. The smaller and better-adapted to your users' real usage cases the language model is, the better the accuracy. An ideal language model for PocketsphinxController has fewer than 200 words.
  • The parts of OpenEars. OpenEars has a simple, flexible and very powerful architecture. PocketsphinxController recognizes speech using a language model that was dynamically created byLanguageModelGeneratorFliteController creates synthesized speech (TTS). And OpenEarsEventsObserver dispatches messages about every feature of OpenEars (what speech was understood by the engine, whether synthesized speech is in progress, if there was an audio interruption) to any part of your app.
BACK TO TOP

FliteController Class Reference

Detailed Description

The class that controls speech synthesis (TTS) in OpenEars.

Usage examples

Preparing to use the class:

To use FliteController, you need to have at least one Flite voice added to your project. When you added the "framework" folder of OpenEars to your app, you already imported a voice called Slt, so these instructions will use the Slt voice. You can get eight more free voices in OpenEarsExtras, available at https://bitbucket.org/Politepix/openearsextras

What to add to your header:

Add the following lines to your header (the .h file). Under the imports at the very top:
#import <Slt/Slt.h>
#import <OpenEars/FliteController.h>
In the middle part where instance variables go:
FliteController *fliteController;
Slt *slt;
In the bottom part where class properties go:
@property (strong, nonatomic) FliteController *fliteController;
@property (strong, nonatomic) Slt *slt;

What to add to your implementation:

Add the following to your implementation (the .m file):Under the @implementation keyword at the top:
@synthesize fliteController;
@synthesize slt;
Among the other methods of the class, add these lazy accessor methods for confident memory management of the object:
- (FliteController *)fliteController {
	if (fliteController == nil) {
		fliteController = [[FliteController alloc] init];
	}
	return fliteController;
}

- (Slt *)slt {
	if (slt == nil) {
		slt = [[Slt alloc] init];
	}
	return slt;
}

How to use the class methods:

In the method where you want to call speech (to test this out, add it to your viewDidLoad method), add the following method call:
[self.fliteController say:@"A short statement" withVoice:self.slt];
Warning
There can only be one  FliteController instance in your app at any given moment.

Method Documentation

- (void) say:   (NSString *)  statement
withVoice:   (FliteVoice *)  voiceToUse 
       

This takes an NSString which is the word or phrase you want to say, and the FliteVoice to use to say the phrase. Usage Example:

[ self.fliteController say: @"Say it, don't spray it." withVoice: self.slt];

There are a total of nine FliteVoices available for use with OpenEars. The Slt voice is the most popular one and it ships with OpenEars. The other eight voices can be downloaded as part of the OpenEarsExtras package available at the URL http://bitbucket.org/Politepix/openearsextras. To use them, just drag the desired downloaded voice's framework into your app, import its header at the top of your calling class (e.g. import <Slt/Slt.h> or import <Rms/Rms.h>) and instantiate it as you would any other object, then passing the instantiated voice to this method.

- (Float32) fliteOutputLevel      

A read-only attribute that tells you the volume level of synthesized speech in progress. This is a UI hook. You can't read it on the main thread.

Property Documentation

- (float) duration_stretch

duration_stretch changes the speed of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.

- (float) target_mean

target_mean changes the pitch of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.

- (float) target_stddev

target_stddev changes convolution of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.

- (BOOL) userCanInterruptSpeech

Set userCanInterruptSpeech to TRUE in order to let new incoming human speech cut off synthesized speech in progress.

BACK TO TOP

LanguageModelGenerator Class Reference

Detailed Description

The class that generates the vocabulary the PocketsphinxController is able to understand.

Usage examples

What to add to your implementation:

Add the following to your implementation (the .m file):Under the @implementation keyword at the top:
#import <OpenEars/LanguageModelGenerator.h>
Wherever you need to instantiate the language model generator, do it as follows:
LanguageModelGenerator *lmGenerator = [[LanguageModelGenerator alloc] init];

How to use the class methods:

In the method where you want to create your language model (for instance your viewDidLoad method), add the following method call (replacing the placeholders like "WORD" and "A PHRASE" with actual words and phrases you want to be able to recognize):
NSArray *words = [NSArray arrayWithObjects:@"WORD", @"STATEMENT", @"OTHER WORD", @"A PHRASE", nil];
NSString *name = @"NameIWantForMyLanguageModelFiles";
NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name];


NSDictionary *languageGeneratorResults = nil;

NSString *lmPath = nil;
NSString *dicPath = nil;
	
if([err code] == noErr) {
	
	languageGeneratorResults = [err userInfo];
		
	lmPath = [languageGeneratorResults objectForKey:@"LMPath"];
	dicPath = [languageGeneratorResults objectForKey:@"DictionaryPath"];
		
} else {
	NSLog(@"Error: %@",[err localizedDescription]);
}
If you are using the default English-language model generation, it is a requirement to enter your words and phrases in all capital letters, since the model is generated against a dictionary in which the entries are capitalized (meaning that if the words in the array aren't capitalized, they will not match the dictionary and you will not have the widest variety of pronunciations understood for the word you are using).If you need to create a fixed language model ahead of time instead of creating it dynamically in your app, just use this method (or generateLanguageModelFromTextFile:withFilesNamed:) to submit your full language model using the Simulator and then use the  Simulator documents folder script to get the language model and dictionary file out of the documents folder and add it to your app bundle, referencing it from there.

Method Documentation

- (NSError *) generateLanguageModelFromArray:   (NSArray *)  languageModelArray
withFilesNamed:   (NSString *)  fileName 
       

Generate a language model from an array of NSStrings which are the words and phrases you want PocketsphinxController or PocketsphinxController+RapidEars to understand. Putting a phrase in as a string makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Documents directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. The error that this method returns contains the paths to the files that were created in a successful generation effort in its userInfo when NSError == noErr. The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".

- (NSError *) generateLanguageModelFromTextFile:   (NSString *)  pathToTextFile
withFilesNamed:   (NSString *)  fileName 
       

Generate a language model from a text file containing words and phrases you want PocketsphinxController to understand. The file should be formatted with every word or contiguous phrase on its own line with a line break afterwards. Putting a phrase in on its own line makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. Give the correct full path to the text file as a string. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Documents directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. The error that this method returns contains the paths to the files that were created in a successful generation effort in its userInfo when NSError == noErr. The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".

Property Documentation

- (BOOL) verboseLanguageModelGenerator

Set this to TRUE to get verbose output

- (BOOL) useFallbackMethod

Advanced: turn this off if the words in your input array or text file aren't in English and you are using a custom dictionary file

- (NSString *) dictionaryPathAsString

Advanced: if you have your own pronunciation dictionary you want to use instead of CMU07a.dic you can assign its full path to this property before running the language model generation.

BACK TO TOP

OpenEarsEventsObserver Class Reference

Detailed Description

OpenEarsEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OpenEarsEventsObservers as you need and receive information using them simultaneously. All of the documentation for the use ofOpenEarsEventsObserver is found in the sectionOpenEarsEventsObserverDelegate.

Property Documentation

- (id< OpenEarsEventsObserverDelegate >) delegate

To use the OpenEarsEventsObserverDelegate methods, assign this delegate to the class hosting OpenEarsEventsObserver and then use the delegate methods documented under OpenEarsEventsObserverDelegate. There is a complete example of how to do this explained under theOpenEarsEventsObserverDelegate documentation.

BACK TO TOP

OpenEarsLogging Class Reference

Detailed Description

A singleton which turns logging on or off for the entire framework. The type of logging is related to overall framework functionality such as the audio session and timing operations. Please turn OpenEarsLogging on for any issue you encounter. It will probably show the problem, but if not you can show the log on the forum and get help.

Warning
The individual classes such as  PocketsphinxController and LanguageModelGenerator have their own verbose flags which are separate from  OpenEarsLogging.

Method Documentation

+ (id) startOpenEarsLogging      

This just turns on logging. If you don't want logging in your session, don't send the startOpenEarsLogging message.

Example Usage:

Before implementation:

#import <OpenEars/OpenEarsLogging.h>;

In implementation:

BACK TO TOP

PocketsphinxController Class Reference

Detailed Description

The class that controls local speech recognition in OpenEars.

Usage examples

Preparing to use the class:

To use PocketsphinxController, you need a language model and a phonetic dictionary for it. These files define which words PocketsphinxController is capable of recognizing. They are created above by using LanguageModelGenerator.

What to add to your header:

Add the following lines to your header (the .h file). Under the imports at the very top:
#import <OpenEars/PocketsphinxController.h>
In the middle part where instance variables go:
PocketsphinxController *pocketsphinxController;
In the bottom part where class properties go:
@property (strong, nonatomic) PocketsphinxController *pocketsphinxController;

What to add to your implementation:

Add the following to your implementation (the .m file):Under the @implementation keyword at the top:
@synthesize pocketsphinxController;
Among the other methods of the class, add this lazy accessor method for confident memory management of the object:
- (PocketsphinxController *)pocketsphinxController {
	if (pocketsphinxController == nil) {
		pocketsphinxController = [[PocketsphinxController alloc] init];
	}
	return pocketsphinxController;
}

How to use the class methods:

In the method where you want to recognize speech (to test this out, add it to your viewDidLoad method), add the following method call:
[self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath languageModelIsJSGF:NO];
Warning
There can only be one  PocketsphinxController instance in your app.

Method Documentation

- (void) startListeningWithLanguageModelAtPath:   (NSString *)  languageModelPath
dictionaryAtPath:   (NSString *)  dictionaryPath
languageModelIsJSGF:   (BOOL)  languageModelIsJSGF 
       

Start the speech recognition engine up. You provide the full paths to a language model and a dictionary file which are created usingLanguageModelGenerator.

- (void) stopListening      

Shut down the engine. You must do this before releasing a parent view controller that contains PocketsphinxController.

- (void) suspendRecognition      

Keep the engine going but stop listening to speech until resumeRecognition is called. Takes effect instantly.

- (void) resumeRecognition      

Resume listening for speech after suspendRecognition has been called.

- (void) changeLanguageModelToFile:   (NSString *)  languageModelPathAsString
withDictionary:   (NSString *)  dictionaryPathAsString 
       

Change from one language model to another. This lets you change which words you are listening for depending on the context in your app.

- (Float32) pocketsphinxInputLevel      

Gives the volume of the incoming speech. This is a UI hook. You can't read it on the main thread or it will block.

- (void) runRecognitionOnWavFileAtPath:   (NSString *)  wavPath
usingLanguageModelAtPath:   (NSString *)  languageModelPath
dictionaryAtPath:   (NSString *)  dictionaryPath
languageModelIsJSGF:   (BOOL)  languageModelIsJSGF 
       

You can use this to run recognition on an already-recorded WAV file for testing. The WAV file has to be 16-bit and 16000 samples per second.

Property Documentation

- (float) secondsOfSilenceToDetect

This is how long PocketsphinxController should wait after speech ends to attempt to recognize speech. This defaults to .7 seconds.

- (BOOL) returnNbest

Advanced: set this to TRUE to receive n-best results.

- (int) nBestNumber

Advanced: the number of n-best results to return. This is a maximum number to return – if there are null hypotheses fewer than this number will be returned.

- (int) calibrationTime

How long to calibrate for. This can only be one of the values '1', '2', or '3'. Defaults to 1.

- (BOOL) verbosePocketSphinx

Turn on verbose output. Do this any time you encounter an issue and any time you need to report an issue on the forums.

- (BOOL) returnNullHypotheses

By default, PocketsphinxController won't return a hypothesis if for some reason the hypothesis is null (this can happen if the perceived sound was just noise). If you need even empty hypotheses to be returned, you can set this to TRUE before starting PocketsphinxController.

BACK TO TOP

<OpenEarsEventsObserverDelegate> Protocol Reference

Detailed Description

OpenEarsEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OpenEarsEventsObservers as you need and receive information using them simultaneously.

Usage examples

What to add to your header:

Add the following lines to your header (the .h file). Under the imports at the very top:
#import <OpenEars/OpenEarsEventsObserver.h>
at the @interface declaration, add the OpenEarsEventsObserverDelegate inheritance.An example of this for a view controller called ViewController would look like this:
@interface ViewController : UIViewController <OpenEarsEventsObserverDelegate> {
In the middle part where instance variables go:
OpenEarsEventsObserver *openEarsEventsObserver;
In the bottom part where class properties go:
@property (strong, nonatomic) OpenEarsEventsObserver *openEarsEventsObserver;

What to add to your implementation:

Add the following to your implementation (the .m file):Under the @implementation keyword at the top:
@synthesize openEarsEventsObserver;
Among the other methods of the class, add this lazy accessor method for confident memory management of the object:
- (OpenEarsEventsObserver *)openEarsEventsObserver {
	if (openEarsEventsObserver == nil) {
		openEarsEventsObserver = [[OpenEarsEventsObserver alloc] init];
	}
	return openEarsEventsObserver;
}
and then right before you start your first OpenEars functionality (for instance, right before your first self.fliteController say:withVoice: message or right before your first self.pocketsphinxController startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF: message) send this message:
[self.openEarsEventsObserver setDelegate:self];

How to use the class methods:

Add these delegate methods of OpenEarsEventsObserver to your class:
- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID {
	NSLog(@"The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID);
}

- (void) pocketsphinxDidStartCalibration {
	NSLog(@"Pocketsphinx calibration has started.");
}

- (void) pocketsphinxDidCompleteCalibration {
	NSLog(@"Pocketsphinx calibration is complete.");
}

- (void) pocketsphinxDidStartListening {
	NSLog(@"Pocketsphinx is now listening.");
}

- (void) pocketsphinxDidDetectSpeech {
	NSLog(@"Pocketsphinx has detected speech.");
}

- (void) pocketsphinxDidDetectFinishedSpeech {
	NSLog(@"Pocketsphinx has detected a period of silence, concluding an utterance.");
}

- (void) pocketsphinxDidStopListening {
	NSLog(@"Pocketsphinx has stopped listening.");
}

- (void) pocketsphinxDidSuspendRecognition {
	NSLog(@"Pocketsphinx has suspended recognition.");
}

- (void) pocketsphinxDidResumeRecognition {
	NSLog(@"Pocketsphinx has resumed recognition."); 
}

- (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString {
	NSLog(@"Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString);
}

- (void) pocketSphinxContinuousSetupDidFail { // This can let you know that something went wrong with the recognition loop startup. Turn on OPENEARSLOGGING to learn why.
	NSLog(@"Setting up the continuous recognition loop has failed for some reason, please turn on OpenEarsLogging to learn more.");
}

Method Documentation

- (void) audioSessionInterruptionDidBegin      

There was an interruption.

- (void) audioSessionInterruptionDidEnd      

The interruption ended.

- (void) audioInputDidBecomeUnavailable      

The input became unavailable.

- (void) audioInputDidBecomeAvailable      

The input became available again.

- (void) audioRouteDidChangeToRoute:   (NSString *)  newRoute  

The audio route changed.

- (void) pocketsphinxDidStartCalibration      

Pocketsphinx isn't listening yet but it started calibration.

- (void) pocketsphinxDidCompleteCalibration      

Pocketsphinx isn't listening yet but calibration completed.

- (void) pocketsphinxRecognitionLoopDidStart      

Pocketsphinx isn't listening yet but it has entered the main recognition loop.

- (void) pocketsphinxDidStartListening      

Pocketsphinx is now listening.

- (void) pocketsphinxDidDetectSpeech      

Pocketsphinx heard speech and is about to process it.

- (void) pocketsphinxDidDetectFinishedSpeech      

Pocketsphinx detected a second of silence indicating the end of an utterance

- (void) pocketsphinxDidReceiveHypothesis:   (NSString *)  hypothesis
recognitionScore:   (NSString *)  recognitionScore
utteranceID:   (NSString *)  utteranceID 
       

Pocketsphinx has a hypothesis.

- (void) pocketsphinxDidReceiveNBestHypothesisArray:   (NSArray *)  hypothesisArray  

Pocketsphinx has an n-best hypothesis dictionary.

- (void) pocketsphinxDidStopListening      

Pocketsphinx has exited the continuous listening loop.

- (void) pocketsphinxDidSuspendRecognition      

Pocketsphinx has not exited the continuous listening loop but it will not attempt recognition.

- (void) pocketsphinxDidResumeRecognition      

Pocketsphinx has not existed the continuous listening loop and it will now start attempting recognition again.

- (void) pocketsphinxDidChangeLanguageModelToFile:   (NSString *)  newLanguageModelPathAsString
andDictionary:   (NSString *)  newDictionaryPathAsString 
       

Pocketsphinx switched language models inline.

- (void) pocketSphinxContinuousSetupDidFail      

Some aspect of setting up the continuous loop failed, turn onOpenEarsLogging for more info.

- (void) fliteDidStartSpeaking      

Flite started speaking. You probably don't have to do anything about this.

- (void) fliteDidFinishSpeaking      

Flite finished speaking. You probably don't have to do anything about this.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值