SAPI-TTS中的事件详解

最新推荐文章于 2022-11-18 14:44:31 发布

seasuncs

最新推荐文章于 2022-11-18 14:44:31 发布

阅读量1.1w

点赞数

分类专栏： SAPI-TTS文本语音转换文章标签： tts character stream underscore input dao

本文链接：https://blog.csdn.net/seasuncs/article/details/1856479

版权

本文详细解析了SAPI-TTS引擎中的事件，包括SPEI_START_INPUT_STREAM、SPEI_END_INPUT_STREAM等，介绍了如何通过ISpEventSource接口设置和获取事件，以及这些事件在TTS过程中的意义和应用场景。

摘要由CSDN通过智能技术生成

TTS中相关事件的解释

本文来自 SAPI5.1 /seasun 整理2007-10-29

TTS 引擎通过 Events结构来把事件信息传递给应用程序。应用程序可以调用ISpEventSource::SetInterest 来设置感兴趣的事件。这个函数也可以通过ISpVoice来调用，因为它是继承ISpEventSource的。程序还可以调用ISpEventSource::GetEvents来获得事件的详细信息。

以下是与TTS引擎相关的事件，它是SPEVENTENUM的子集。

typedef enum SPEVENTENUM

{

//--- TTS engine

SPEI_START_INPUT_STREAM = 1,

SPEI_END_INPUT_STREAM = 2,

SPEI_VOICE_CHANGE = 3, // LPARAM_IS_TOKEN

SPEI_TTS_BOOKMARK = 4, // LPARAM_IS_STRING

SPEI_WORD_BOUNDARY = 5,

SPEI_PHONEME = 6,

SPEI_SENTENCE_BOUNDARY = 7,

SPEI_VISEME = 8,

SPEI_TTS_AUDIO_LEVEL = 9

} SPEVENTENUM ;

SPEVENT 结构包含了不同的事件各种信息。

typedef struct SPEVENT

    WORD         eEventId;    //与SPEVENTENUM对应

    WORD         elParamType;

    ULONG        ulStreamNum;

    ULONGLONG    ullAudioStreamOffset;

    WPARAM       wParam;

    LPARAM       lParam;

} SPEVENT;

应用程序可以根据不同的事件类型来分析这些信息。结构体中， ulStreamNum 是对应于ISpVoice::Speak 或者 ISpVoice::SpeakStream 的返回值。

SPEI_START_INPUT_STREAM ：当输出对象开始从一个某个流中获取输出内容时发生此事件， eEventId 域等于 SPEI_START_INPUT_STREAM ， 其他的域无意义。

SPEI_END_INPUT_STREAM ：当输出对象从某个流中取得最后的输出内容时，发生此事件。其它域无意义。

SPEI_VOICE_CHANGE：当输入的文本或流被XML标签改变其相关属性时发生此事件；每次调用Speak函数时也会发生此事件。更详细的信息参见Object Tokens and Registry Settings white paper。

SPEVENT Field	Voice Change event
eEventId	SPEI_VOICE_CHANGE
elParamType	SPET_LPARAM_IS_TOKEN
wParam
lParam	Object token of the new voice. 新语音的记号

SPEI_TTS_BOOKMARK：表示获得一个书签标签事件。可以在输入的文本中插入书签指令<Bookmark>.

SPEVENT Field	Bookmark event
eEventId	SPEI_TTS_BOOKMARK
elParamType	SPET_LPARAM_IS_STRING
wParam	Value of the bookmark string when converted to a long (_wtol(...) can be used). 书签的文字内容
lParam	Null-terminated copy of the bookmark string. 以空字符结束的书签文本。

SPEI_WORD_BOUNDARY: 当到达一个新单词时发生，即表明一个新单词开始

SPEVENT Field	Word Boundary event
eEventId	SPEI_WORD_BOUNDARY
elParamType	SPET_LPARAM_IS_UNKNOWN
wParam	Character offset at the beginning of the word being synthesized. 本单词第一个字符在本次合成中的偏移量
lParam	Character length of the word in the current input stream being synthesized 本单词的长度。

SPEI_SENTENCE_BOUNDARY：一个新句子的边界。

SPEVENT Field	Sentence Boundary event
eEventId	SPEI_SENTENCE_BOUNDARY
elParamType	SPET_LPARAM_IS_UNKNOWN
wParam	Character offset at the beginning of the sentence being synthesized. 一个句子弟一个单词的在本次合成流中的偏移量。
lParam	Character length of the sentence in the current input stream being synthesized 。这个句子的长度。

SPEI_PHONEME：一个音素的边界

SPEVENT Field	Phoneme event
eEventId	SPEI_PHONEME
elParamType	SPET_LPARAM_IS_UNKNOWN
wParam	The high word is the duration, in milliseconds, of the current phoneme. 本音素的时间长度（微妙表示） The low word is the PhoneID of the next phoneme. 下一个因素的ID值
lParam	The low word is the PhoneID of the current phoneme. 当前音素的ID The high word is the SPVFEATURE value associated with the current phoneme. 本因素的特征标记

可以参见后文附表中的音素表。 ( 有美国，中国，日本 )

低字表示SPVFEATURE，包含两个标志：SPVFEATURE_STRESSED-表示当前音素比本单词中的其他音素要强，通常用在需要重读的元音里面。而SPVFEATURE_EMPHASIS表示这个因素“重读单词”的重音音素。强音（Stress）表示一个单词里面的元音部分重读，而强调音（emphasis）表示一个句子中的某个单词要重读。

SPEI_VISEME：表示到达了一个新的嘴形。（译者：每个单词的发音都需要不同的嘴形配合）。

SPEVENT Field	Viseme event
eEventId	SPEI_VISEME
elParamType	SPET_LPARAM_IS_UNKNOWN
wParam	The high word is the duration, in milliseconds, of the current viseme. 当前嘴形持续的时间 The low word is the code for the next viseme. 下一个嘴形的代码
lParam	The low word is the code of the current viseme. 当前嘴形的代码 The high word is the SPVFEATURE value associated with the current viseme (and phoneme). 当前嘴形和音素的特征。

详细的 viseme 列表请参见 SPVISEMES 。 ( 见附录 )

SPEI_TTS_AUDIO_LEVEL：表示音频到达了一个指定的合成量级。

SPEVENT Field	Audio Level event
eEventId	SPEI_TTS_AUDIO_LEVEL
elParamType	SPET_LPARAM_IS_UNDEFINED
wParam	TTS audio level (ULONG). TTS 音频的量级
lParam	NULL

附录A：美国英语音素表示

本文简述了SAPI中音素表示的应用和实现。

符号与数值表示法

应用程序开发人员可以使用英语的音素表示法，根据以下表格来创建一个字典中没有的新单词的发音方法。音素表是由发音的符号表示组成的。

开发者可以使用XML标签<PRON > 来创建一个新的发音,或者创建一个新的发音字典字典。每个因素之间用空格区分。

标签<PRON SYM > 作用是插入一个新的发音。比如“Hello”可以用以下标签来表示：

<PRON SYM = "h eh l ow"/>

为了更精确，还可以加入重音(1) 和次重音符号(2)，以及音节符号。

以下加入了重音符号（1）和音节符号。

<PRON SYM = "h eh - l ow 1"/>

American English Phoneme Table

SYM	Example	PhoneID
-	syllable boundary (hyphen)	1
!	Sentence terminator (exclamation mark)	2
&	word boundary	3
,	Sentence terminator (comma)	4
.	Sentence terminator (period)	5
?	Sentence terminator (question mark)	6
_	Silence (underscore)	7
1	Primary stress	8
2	Secondary stress	9
aa	f ather	10
ae	c at	11
ah	c ut	12
ao	d og	13
aw	f oul	14
ax	a go	15
ay	b ite	16
b	big	17
ch	chin	18
d	dig	19
dh	then	20
eh	p et	21
er	f ur	22
ey	ate	23
f	fork	24
g	gut	25
h	help	26
ih	f ill	27
iy	f eel	28
jh	joy	29
k	cut	30
l	lid	31
m	mat	32
n	no	33
ng	si ng	34
ow	g o	35
oy	t oy	36
p	put	37
r	red	38
s	sit	39
sh	she	40
t	talk	41
th	thin	42
uh	b ook	43
uw	t oo	44
v	vat	45
w	with	46
y	yard	47
z	zap	48
zh	plea sure	49

附录B：汉语的音素表

Chinese Phonemes

The following table defines the Chinese language phoneme set.

Symbol	PhoneID	Example
-	1	Syllable boundary (hyphen)
!	2	Sentence terminator (exclamation mark)
&	3	word boundary
,	4	Sentence terminator (comma)
.	5	Sentence terminator (period)
?	6	Sentence terminator (question mark)
_	7	Silence (underscore)
+	8	primary stress
*	9	secondary stress
1	10