General Framework for AM:
Building ASR system incrementally
Context-independent ➔ Context-dependent modeling 上下文无关文本➔上下文相关模型
Mono-phone ➔ Tri-phone HMM 单音素➔三音素
Single Gaussian mixture per state ➔ Multiple Gaussian mixtures per state 单高斯➔混合高斯
Data Preparation:
Acoustic Unit Selection:
Criteria
Accurate: accurately represent the acoustic realization that appears in different contexts
Trainable: have enough data to estimate the parameters of the unit
Generalizable: any new word can be derived from a predefined unit inventory for task-independent speech recognition
准确性:准确地表示出现在不同上下文中的声学实现
可训练的:有足够的数据估计参数
可概括的:可以从任务无关语音识别的预定单位清单中导出任何新单词
Units available
- Word
- Syllable 音节
- Initial/Final (Chinese-specific)
- Phoneme 音素