OpenEars 包括离线语音处理等等
http://www.politepix.com/openears/
- Welcome to OpenEars: free speech recognition and speech synthesis for the iPhone
- Introduction
- Installation
- Basic concepts
- FliteController Class Reference
- LanguageModelGenerator Class Reference
- OpenEarsEventsObserver Class Reference
- OpenEarsLogging Class Reference
- PocketsphinxController Class Reference
- <OpenEarsEventsObserverDelegate> Protocol Reference
Introduction
OpenEars is an shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement round-trip English language speech recognition and text-to-speech on the iPhone and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone or iPad app. It is the most popular offline framework for speech recognition and speech synthesis on iOS and has been featured in development books such as O'Reilly's Basic Sensors in iOS by Alasdair Allan and Cocos2d for iPhone 1 Game Development Cookbook by Nathan Burba.
Highly-accurate large-vocabulary recognition (that is, trying to recognize any word the user speaks out of many thousands of known words) is not yet a reality for local in-app processing on the iPhone given the hardware limitations of the platform; even Siri does its large-vocabulary recognition on the server side. However, Pocketsphinx (the open source voice recognition engine that OpenEars uses) is capable of local recognition on the iPhone of vocabularies with hundreds of words depending on the environment and other factors, and performs very well with command-and-control language models. The best part is that it uses no network connectivity because all processing occurs locally on the device.
The current version of OpenEars is 1.2.4. Download OpenEars 1.2.4or read its changelog.
Features of OpenEars
OpenEars can:
- Listen continuously for speech on a background thread, while suspending or resuming speech processing on demand, all while using less than 4% CPU on average on an iPhone 4(decoding speech, text-to-speech, updating the UI and other intermittent functions use more CPU),
- Use any of 9 voices for speech, including male and female voices with a range of speed/quality level, and switch between them on the fly,
- Change the pitch, speed and variance of any text-to-speech voice,
- Know whether headphones are plugged in and continue voice recognition during text-to-speech only when they are plugged in,
- Support bluetooth audio devices (experimental),
- Dispatch information to any part of your app about the results of speech recognition and speech, or changes in the state of the audio session (such as an incoming phone call or headphones being plugged in),
- Deliver level metering for both speech input and speech output so you can design visual feedback for both states.
- Support JSGF grammars,
- Dynamically generate new ARPA language models in-app based on input from an NSArray of NSStrings,
- Switch between ARPA language models or JSGF grammars on the fly,
- Get n-best lists with scoring,
- Test existing recordings,
- Be easily interacted with via standard and simple Objective-C methods,
- Control all audio functions with text-to-speech and speech recognition in memory instead of writing audio files to disk and then reading them,
- Drive speech recognition with a low-latency Audio Unit driver for highest responsiveness,
- Be installed in a Cocoa-standard fashion using an easy-peasy already-compiled framework.
- In addition to its various new features and faster recognition/text-to-speech responsiveness, OpenEars now has improved recognition accuracy.
- OpenEars is free to use in an iPhone or iPad app.
-
Warning
-
Before using OpenEars, please note it has to use a different audio driver on the Simulator that is less accurate, so it is always necessary to evaluate accuracy on a real device. Please don't submit support requests for accuracy issues with the Simulator.
-
Warning
- Because Apple has removed armv6 architecture compiling in Xcode 4.5, and it is only possible to support upcoming devices using the armv7s architecture available in Xcode 4.5, there was no other option than to end support for armv6 devices after OpenEars 1.2. That means that current version of OpenEars only supports armv7 and armv7s devices (iPhone 3GS and later). If your app supports older devices like the first generation iPhone or the iPhone 3G, you can continue to download the legacy edition of OpenEars 1.2 here, but that edition will not update further – all updated versions of OpenEars starting with 1.2.1 will not support armv6 devices, just armv7 and armv7s. If you have previously been supporting older devices and you want to submit an app update removing that support, you must set your minimum deployment target to iOS 4.3 or later, or your app will be rejected by Apple. The framework is 100% compatible with LLVM-using versions of Xcode which precede version 4.5, but your app must be set to not compile the armv6 architecture in order to use it.
Installation
To use OpenEars:
- Download the distribution and unpack it.
- Create your own app, and add the iOS frameworks AudioToolbox and AVFoundation to it.
- Inside your downloaded distribution there is a folder called "Frameworks". Drag the "Frameworks" folder into your app project in Xcode.
OK, now that you've finished laying the groundwork, you have to...wait, that's everything. You're ready to start using OpenEars. Give the sample app a spin to try out the features (the sample app uses ARC so you'll need a recent Xcode version) and then visit the Politepix interactive tutorial generator for a customized tutorial showing you exactly what code to add to your app for all of the different functionality of OpenEars.
If the steps on this page didn't work for you, you can get free support at the forums, read the FAQ, brush up on the documentation, or open aprivate email support incident at the Politepix shop. If you'd like to read the documentation, simply read onward.
Basic concepts
There are a few basic concepts to understand about voice recognition and OpenEars that will make it easiest to create an app.
- Local or offline speech recognition versus server-based or online speech recognition: most speech recognition on the iPhone is done by streaming the speech audio to servers. OpenEars works by doing the recognition inside the iPhone without using the network. This saves bandwidth and results in faster response, but since a server is much more powerful than a phone it means that we have to work with much smaller vocabularies to get accurate recognition.
- Language Models. The language model is the vocabulary that you want OpenEars to understand, in a format that its speech recognition engine can understand. The smaller and better-adapted to your users' real usage cases the language model is, the better the accuracy. An ideal language model for PocketsphinxController has fewer than 200 words.
- The parts of OpenEars. OpenEars has a simple, flexible and very powerful architecture. PocketsphinxController recognizes speech using a language model that was dynamically created byLanguageModelGenerator. FliteController creates synthesized speech (TTS). And OpenEarsEventsObserver dispatches messages about every feature of OpenEars (what speech was understood by the engine, whether synthesized speech is in progress, if there was an audio interruption) to any part of your app.
Detailed Description
The class that controls speech synthesis (TTS) in OpenEars.
Usage examples
Preparing to use the class:
To use FliteController, you need to have at least one Flite voice added to your project. When you added the "framework" folder of OpenEars to your app, you already imported a voice called Slt, so these instructions will use the Slt voice. You can get eight more free voices in OpenEarsExtras, available at https://bitbucket.org/Politepix/openearsextras
Add the following lines to your header (the .h file). Under the imports at the very top:What to add to your header:
#import <Slt/Slt.h> #import <OpenEars/FliteController.h>In the middle part where instance variables go:
FliteController *fliteController; Slt *slt;In the bottom part where class properties go:
@property (strong, nonatomic) FliteController *fliteController; @property (strong, nonatomic) Slt *slt;
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
@synthesize fliteController; @synthesize slt;Among the other methods of the class, add these lazy accessor methods for confident memory management of the object:
- (FliteController *)fliteController { if (fliteController == nil) { fliteController = [[FliteController alloc] init]; } return fliteController; } - (Slt *)slt { if (slt == nil) { slt = [[Slt alloc] init]; } return slt; }
In the method where you want to call speech (to test this out, add it to your viewDidLoad method), add the following method call:How to use the class methods:
[self.fliteController say:@"A short statement" withVoice:self.slt];
-
Warning
- There can only be one FliteController instance in your app at any given moment.
Method Documentation
- (void) say: | (NSString *) | statement | |
withVoice: | (FliteVoice *) | voiceToUse | |
This takes an NSString which is the word or phrase you want to say, and the FliteVoice to use to say the phrase. Usage Example:
There are a total of nine FliteVoices available for use with OpenEars. The Slt voice is the most popular one and it ships with OpenEars. The other eight voices can be downloaded as part of the OpenEarsExtras package available at the URL http://bitbucket.org/Politepix/openearsextras. To use them, just drag the desired downloaded voice's framework into your app, import its header at the top of your calling class (e.g. import <Slt/Slt.h> or import <Rms/Rms.h>) and instantiate it as you would any other object, then passing the instantiated voice to this method.
- (Float32) fliteOutputLevel |
A read-only attribute that tells you the volume level of synthesized speech in progress. This is a UI hook. You can't read it on the main thread.
Property Documentation
|
duration_stretch changes the speed of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.
|
target_mean changes the pitch of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.
|
target_stddev changes convolution of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.
|
Set userCanInterruptSpeech to TRUE in order to let new incoming human speech cut off synthesized speech in progress.
Detailed Description
The class that generates the vocabulary the PocketsphinxController is able to understand.
Usage examples
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
#import <OpenEars/LanguageModelGenerator.h>Wherever you need to instantiate the language model generator, do it as follows:
LanguageModelGenerator *lmGenerator = [[LanguageModelGenerator alloc] init];
In the method where you want to create your language model (for instance your viewDidLoad method), add the following method call (replacing the placeholders like "WORD" and "A PHRASE" with actual words and phrases you want to be able to recognize):How to use the class methods:
NSArray *words = [NSArray arrayWithObjects:@"WORD", @"STATEMENT", @"OTHER WORD", @"A PHRASE", nil]; NSString *name = @"NameIWantForMyLanguageModelFiles"; NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name]; NSDictionary *languageGeneratorResults = nil; NSString *lmPath = nil; NSString *dicPath = nil; if([err code] == noErr) { languageGeneratorResults = [err userInfo]; lmPath = [languageGeneratorResults objectForKey:@"LMPath"]; dicPath = [languageGeneratorResults objectForKey:@"DictionaryPath"]; } else { NSLog(@"Error: %@",[err localizedDescription]); }If you are using the default English-language model generation, it is a requirement to enter your words and phrases in all capital letters, since the model is generated against a dictionary in which the entries are capitalized (meaning that if the words in the array aren't capitalized, they will not match the dictionary and you will not have the widest variety of pronunciations understood for the word you are using).If you need to create a fixed language model ahead of time instead of creating it dynamically in your app, just use this method (or generateLanguageModelFromTextFile:withFilesNamed:) to submit your full language model using the Simulator and then use the Simulator documents folder script to get the language model and dictionary file out of the documents folder and add it to your app bundle, referencing it from there.
Method Documentation
- (NSError *) generateLanguageModelFromArray: | (NSArray *) | languageModelArray | |
withFilesNamed: | (NSString *) | fileName | |
Generate a language model from an array of NSStrings which are the words and phrases you want PocketsphinxController or PocketsphinxController+RapidEars to understand. Putting a phrase in as a string makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Documents directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. The error that this method returns contains the paths to the files that were created in a successful generation effort in its userInfo when NSError == noErr. The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".
- (NSError *) generateLanguageModelFromTextFile: | (NSString *) | pathToTextFile | |
withFilesNamed: | (NSString *) | fileName | |
Generate a language model from a text file containing words and phrases you want PocketsphinxController to understand. The file should be formatted with every word or contiguous phrase on its own line with a line break afterwards. Putting a phrase in on its own line makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. Give the correct full path to the text file as a string. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Documents directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. The error that this method returns contains the paths to the files that were created in a successful generation effort in its userInfo when NSError == noErr. The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".
Property Documentation
|
Set this to TRUE to get verbose output
|
Advanced: turn this off if the words in your input array or text file aren't in English and you are using a custom dictionary file
|
Advanced: if you have your own pronunciation dictionary you want to use instead of CMU07a.dic you can assign its full path to this property before running the language model generation.
Detailed Description
OpenEarsEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OpenEarsEventsObservers as you need and receive information using them simultaneously. All of the documentation for the use ofOpenEarsEventsObserver is found in the sectionOpenEarsEventsObserverDelegate.
Property Documentation
|
To use the OpenEarsEventsObserverDelegate methods, assign this delegate to the class hosting OpenEarsEventsObserver and then use the delegate methods documented under OpenEarsEventsObserverDelegate. There is a complete example of how to do this explained under theOpenEarsEventsObserverDelegate documentation.
Detailed Description
A singleton which turns logging on or off for the entire framework. The type of logging is related to overall framework functionality such as the audio session and timing operations. Please turn OpenEarsLogging on for any issue you encounter. It will probably show the problem, but if not you can show the log on the forum and get help.
-
Warning
- The individual classes such as PocketsphinxController and LanguageModelGenerator have their own verbose flags which are separate from OpenEarsLogging.
Method Documentation
+ (id) startOpenEarsLogging |
This just turns on logging. If you don't want logging in your session, don't send the startOpenEarsLogging message.
Example Usage:
Before implementation:
In implementation:
Detailed Description
The class that controls local speech recognition in OpenEars.
Usage examples
Preparing to use the class:
To use PocketsphinxController, you need a language model and a phonetic dictionary for it. These files define which words PocketsphinxController is capable of recognizing. They are created above by using LanguageModelGenerator.
Add the following lines to your header (the .h file). Under the imports at the very top:What to add to your header:
#import <OpenEars/PocketsphinxController.h>In the middle part where instance variables go:
PocketsphinxController *pocketsphinxController;In the bottom part where class properties go:
@property (strong, nonatomic) PocketsphinxController *pocketsphinxController;
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
@synthesize pocketsphinxController;Among the other methods of the class, add this lazy accessor method for confident memory management of the object:
- (PocketsphinxController *)pocketsphinxController { if (pocketsphinxController == nil) { pocketsphinxController = [[PocketsphinxController alloc] init]; } return pocketsphinxController; }
In the method where you want to recognize speech (to test this out, add it to your viewDidLoad method), add the following method call:How to use the class methods:
[self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath languageModelIsJSGF:NO];
-
Warning
- There can only be one PocketsphinxController instance in your app.
Method Documentation
- (void) startListeningWithLanguageModelAtPath: | (NSString *) | languageModelPath | |
dictionaryAtPath: | (NSString *) | dictionaryPath | |
languageModelIsJSGF: | (BOOL) | languageModelIsJSGF | |
Start the speech recognition engine up. You provide the full paths to a language model and a dictionary file which are created usingLanguageModelGenerator.
- (void) stopListening |
Shut down the engine. You must do this before releasing a parent view controller that contains PocketsphinxController.
- (void) suspendRecognition |
Keep the engine going but stop listening to speech until resumeRecognition is called. Takes effect instantly.
- (void) resumeRecognition |
Resume listening for speech after suspendRecognition has been called.
- (void) changeLanguageModelToFile: | (NSString *) | languageModelPathAsString | |
withDictionary: | (NSString *) | dictionaryPathAsString | |
Change from one language model to another. This lets you change which words you are listening for depending on the context in your app.
- (Float32) pocketsphinxInputLevel |
Gives the volume of the incoming speech. This is a UI hook. You can't read it on the main thread or it will block.
- (void) runRecognitionOnWavFileAtPath: | (NSString *) | wavPath | |
usingLanguageModelAtPath: | (NSString *) | languageModelPath | |
dictionaryAtPath: | (NSString *) | dictionaryPath | |
languageModelIsJSGF: | (BOOL) | languageModelIsJSGF | |
You can use this to run recognition on an already-recorded WAV file for testing. The WAV file has to be 16-bit and 16000 samples per second.
Property Documentation
|
This is how long PocketsphinxController should wait after speech ends to attempt to recognize speech. This defaults to .7 seconds.
|
Advanced: set this to TRUE to receive n-best results.
|
Advanced: the number of n-best results to return. This is a maximum number to return – if there are null hypotheses fewer than this number will be returned.
|
How long to calibrate for. This can only be one of the values '1', '2', or '3'. Defaults to 1.
|
Turn on verbose output. Do this any time you encounter an issue and any time you need to report an issue on the forums.
|
By default, PocketsphinxController won't return a hypothesis if for some reason the hypothesis is null (this can happen if the perceived sound was just noise). If you need even empty hypotheses to be returned, you can set this to TRUE before starting PocketsphinxController.
Detailed Description
OpenEarsEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OpenEarsEventsObservers as you need and receive information using them simultaneously.
Usage examples
Add the following lines to your header (the .h file). Under the imports at the very top:What to add to your header:
#import <OpenEars/OpenEarsEventsObserver.h>at the @interface declaration, add the OpenEarsEventsObserverDelegate inheritance.An example of this for a view controller called ViewController would look like this:
@interface ViewController : UIViewController <OpenEarsEventsObserverDelegate> {In the middle part where instance variables go:
OpenEarsEventsObserver *openEarsEventsObserver;In the bottom part where class properties go:
@property (strong, nonatomic) OpenEarsEventsObserver *openEarsEventsObserver;
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
@synthesize openEarsEventsObserver;Among the other methods of the class, add this lazy accessor method for confident memory management of the object:
- (OpenEarsEventsObserver *)openEarsEventsObserver { if (openEarsEventsObserver == nil) { openEarsEventsObserver = [[OpenEarsEventsObserver alloc] init]; } return openEarsEventsObserver; }and then right before you start your first OpenEars functionality (for instance, right before your first self.fliteController say:withVoice: message or right before your first self.pocketsphinxController startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF: message) send this message:
[self.openEarsEventsObserver setDelegate:self];
Add these delegate methods of OpenEarsEventsObserver to your class:How to use the class methods:
- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID { NSLog(@"The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID); } - (void) pocketsphinxDidStartCalibration { NSLog(@"Pocketsphinx calibration has started."); } - (void) pocketsphinxDidCompleteCalibration { NSLog(@"Pocketsphinx calibration is complete."); } - (void) pocketsphinxDidStartListening { NSLog(@"Pocketsphinx is now listening."); } - (void) pocketsphinxDidDetectSpeech { NSLog(@"Pocketsphinx has detected speech."); } - (void) pocketsphinxDidDetectFinishedSpeech { NSLog(@"Pocketsphinx has detected a period of silence, concluding an utterance."); } - (void) pocketsphinxDidStopListening { NSLog(@"Pocketsphinx has stopped listening."); } - (void) pocketsphinxDidSuspendRecognition { NSLog(@"Pocketsphinx has suspended recognition."); } - (void) pocketsphinxDidResumeRecognition { NSLog(@"Pocketsphinx has resumed recognition."); } - (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString { NSLog(@"Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString); } - (void) pocketSphinxContinuousSetupDidFail { // This can let you know that something went wrong with the recognition loop startup. Turn on OPENEARSLOGGING to learn why. NSLog(@"Setting up the continuous recognition loop has failed for some reason, please turn on OpenEarsLogging to learn more."); }
Method Documentation
|
There was an interruption.
|
The interruption ended.
|
The input became unavailable.
|
The input became available again.
|
The audio route changed.
|
Pocketsphinx isn't listening yet but it started calibration.
|
Pocketsphinx isn't listening yet but calibration completed.
|
Pocketsphinx isn't listening yet but it has entered the main recognition loop.
|
Pocketsphinx is now listening.
|
Pocketsphinx heard speech and is about to process it.
|
Pocketsphinx detected a second of silence indicating the end of an utterance
|
Pocketsphinx has a hypothesis.
|
Pocketsphinx has an n-best hypothesis dictionary.
|
Pocketsphinx has exited the continuous listening loop.
|
Pocketsphinx has not exited the continuous listening loop but it will not attempt recognition.
|
Pocketsphinx has not existed the continuous listening loop and it will now start attempting recognition again.
|
Pocketsphinx switched language models inline.
|
Some aspect of setting up the continuous loop failed, turn onOpenEarsLogging for more info.
|
Flite started speaking. You probably don't have to do anything about this.
|
Flite finished speaking. You probably don't have to do anything about this.