Voice-Activated Web Browsing

最新推荐文章于 2024-09-22 10:50:07 发布

artlife

最新推荐文章于 2024-09-22 10:50:07 发布

阅读量2.6k

点赞数

分类专栏： Windows Vista 下的语音识别、语音合成技术文章标签： web microsoft tree macros hyperlink mfc

Windows Vista 下的语音识别、语音合成技术专栏收录该内容

8 篇文章 0 订阅

订阅专栏

[ 译 ]Voice-Activated Web Browsing

序

这篇文章是关于一个 ActiveX 控制器 , 这个东西可以植入 HTML WEB 页面提供一个活性的声音 voice-activated 菜单目录

要编译这个代码 , 你需要 VC6, 微软提供的下面这个东西 Microsoft's Speech SDK 5.1 以及 Internet Explorer headers 这个东西 . 上面的这些东西是 XP 里面自带的 .

样本程序

这个演示方案是一个简单网页 , 其中有两个 <iframe> 要素 : 第一 <iframe> 嵌入了 ActiveX 控件 , 而第二个展现了内容 .

编译及注册 webvoicectl.dll 后 , 找一个叫 DEMO 的文件夹，然后双击里面的 webvoice.html. 你应该在界面的左侧看到树型控件，如上图 . 按下“ voice ”按钮，大型语音引擎被装载需要一些等待时间。

装载完毕，你可以说“ go to class one ” 来启动浏览器。控件会发出确认“ please confirm class one ” 在给与 "positive" . 答复后，需求的项目会在右面显示。

任何时候当说“ help ”就可以得到命令菜单。如果你刚刚运行一个页面，帮助的回答可以在上方，下方以及界面的顶端，底端滚动显示，也可以后退或者浏览。说出滚动方式，然后再说 "navigate" 就可以回到浏览模式。

提示：将麦克风音量调低于可以避免产生回音。

软件背景

附上这篇文章相关的技术

ATL, ActiveX (and wide character string manipulation) 一种广泛的字符处理技术？
Tree view searching, expanding and collapsing 在树状视图中查找，节点的展开和合并
Owner drawn buttons, edit controls and static controls 所有者绘制按钮，编辑控件和静态控件
Image lists, overlays (and painting inside an ATL composite control) 图像列表，，覆盖（在内部绘制一个ATL组件）
Using Microsoft's MSXML parser to load and manipulate an XML file 使用Microsoft's MSXML解析器来装载和操作一个XML文件
Using C++ to interface with the web browser and the html page 使用C++来对web浏览器和html页面定义接口。
SAPI 5.1, speech recognition and text to speech engines and Visemes SAPI 5.1，语音识别和语音引擎，以及基本嘴型识别

当然在项目中使用时你不需要明白上面所有的东西，但是你可以找到一些有用的方案（一些其他代码工程文档）

创建自己的菜单树形结构

你的菜单项目是从配置文件“ data/WebVoice.xml ”（文件名字暂时写成了硬编码）中读取的。这个文件中包含了菜单树和 SAPI 语法的信息。这些内容被保存在一种具有关键字结构的数组中，为了之后的修补。下面显示了一个简短的 XML 例子以及关键字结构。

<menu>

<item>

<txt>Class One</txt> <!- menu text and grammar phrase -->

<ref>../html/class1.html</ref> <!- hyperlink reference -->

</item>

<item>

<txt>Source One</txt>

</item>

<!- more items here -- >

</menu>

typedef struct tag_key

{

int mid;

int pid;

int chd;

HTREEITEM hItem;

HTREEITEM hParent;

char txt[32];

char ref[128];

}KEY;

KEY aKeys[NUMBER_OF_KEYS];

你必须仔细以保证菜单项目的“ id ”是有序的，并且保证此节点的上级节点（ parent id ）在树中是这样的结构。现在装载的时候是没有错误检查的，所以不合法的 XML 文件会导致这个控件不能使用。

SAPI 初始化

网络声音控件通过 InitSapi() 这个方法来完成 SAPI 的初始化，如下：

创建语音引擎
创建识别内容.
为从识别引擎返回的信息设置一种通知机制（windows 消息）
设置识别的事件
装载特殊语法文件
创建文本—>语音引擎（TTS）
设置TTS事件.
为从TTS 返回的信息设置一种通知机制（windows 消息）
设置活动规则

语音 SDK 文档和样例清楚的显示了 SAPI 初始化的调用，所以我不会再次说明。但是，静态语法文件和动态语法却需要一些说明。

SAPI 语法

SAPI 语法在运行的时候从一个 XML 文件中被静态的装载网络声音控件运用 2 个方法静态的部分从 Grammar.xml 被装载 , 此文件有如下格式 :

</DEFINE>

<L>

</L>

</RULE>

<P VAL="1">Dummy Item</P>

</L>

</RULE>

<!-more rules -->

</GRAMMAR>

</PRE>

正如你看到的这个文件片段创建了2个规则。第一个规则， RID_Tree, 定义了静态浏览短语，并且与第二个叫做 RID_MenuItem 的规则相关联第二个规则掌握了一个假值短语，此短语在运行时会被菜单项目的名字所替换。这个文件被SAPI 程序的 gc.exe编译到 Grammar.cfg ，然后被装载进DLL 中的资源文件。动态规则是如下被添加的:

<PRE id=pre3 style="MARGIN-TOP: 0px" nd="43">HRESULT CWebVoice::LoadGrammar()

{

USES_CONVERSION;

HRESULT hr;

SPPROPERTYINFO pi;

ZeroMemory(&pi,sizeof(SPPROPERTYINFO));

pi.ulId = RID_MenuItem; // property ID

pi.vValue.vt = VT_UI4;

// add menu items to the dynamic grammar rule

for(int i=0; i < m_nNumKeys; i++) {

pi.vValue.ulVal = i+1; // Property_Value == data_index + 1

hr=m_cpGrammar->AddWordTransition(hRule,NULL,

T2W(aKeys[i].txt),L" ",SPWT_LEXICAL,1,&pi);

if(FAILED(hr)) return hr;

}

// add a wildcard phrase

pi.vValue.ulVal = 0;

hr=m_cpGrammar->AddWordTransition(hRule,

NULL, L"*", L" ", SPWT_LEXICAL, 1, &pi);

if(FAILED(hr)) return hr;

hr=m_cpGrammar->Commit(NULL); if(FAILED(hr)) return hr;

hr=m_cpGrammar->SetGrammarState(SPGS_ENABLED); if(FAILED(hr)) return hr;

return hr;

}

</PRE>

注意，每个短语(从 aKeys[i].txt 中取出的) 都被分配了一个 RID_MenuItem属性ID 和唯一的属性值 ( 在 1 和 m_nNumKeys 之间) 然后通过 AddWordTransition() 方法，这个短语被添加到语法中。也注意一个规则，如果捕捉到在文法中不存在的短语，在最后会添加 ( "*" )。

识别

识别引擎把说出的词与运行的文法规则做比较. 不论引擎识别还是没有识别，一个回叫程序会被运行来控制语音请求。如下显示了此识别控制器的部分逻辑：

如果识别规则匹配了，就会在静态变量 ind 中储存其属性值，在确认之后，被传递给 HandleConfirm(ind) 方法。此方法使用此识别规则去找出数据数组（ aKeys[ind] ），然后得到正确的数据对象。如果成功，树会展开，显示所选择的项目并且超链接。

Points of Interest

每次我开发一个 ActiveX 控件或者一个基于 ATL 的网络浏览器插件，我不得不复习如何去使用这广泛的字符集合； SAPI 使用专有的广泛字符集合。如果你的代码不需要在 Win98 系统上运行，那么你只需要定义 UNICODE 并且字符被定义成 TCHAR* 这样子 , 常规 API 的调用就没问题了。如果 Win98 用户是必须被考虑的，那么当你使用 Win32 API 时，必须使 multibyte 转换成 wide-string 。幸运的是， ATL 在 <atlbase.h> 中定义了一些转换宏。你只需在需要转换字符串的方法前面使用宏 USES_CONVERSION ，然后使用宏 W2T() 或者 T2W() 来完成相应的转换 . 无疑这些宏是令人担忧的 – 毕竟在每次转换运行的时候，这些宏需要分配内存空间，复制字符串，然后释放掉内存空间。但是，这些宏非常的方便和简洁，我已经在我的 MFC 程序中引用 <atlbase.h> 了 .

我遇到的另一个问题是需要使用自定义按钮 - 标准的对话框背景灰色在 web 页面上并不支持。使用 MFC 时，我可以重载 WM_CTLCOLOR 消息然后在那时变换背景色使用 ATL 时，我发现我得生成自定义按钮，然后控制 WM_DRAWITEM 消息 . 一切很顺利直到我发现我需要标签按钮和瞬时按钮，然后我就得自己编码实现需要的结果。一切都很有趣只不过在我开始 SAPI 编码前花了我一些时间。

Microsoft Speech SDK 5.1 下载大小是 68 MB 。如果你需要用你的代码给 SAPI 运行模块打包，那就必须下载完整包，大小为 131.58 MB.

不幸的是 Microsoft 没有给运行模块打包。连你的客户端也必须下载 SDK ( 包括额外的 30MB 的开发代码和文档说明 ) 或者你自己必须准备一个运行模块包，作为程序的一个单独的下载部分。

修订

2004年1月29日 - 为避免Win98 问题而子类化了图象控制，并且修正了demo 中的小的脚本错误。

Recognition

识别

The recognition engine compares your spoken words to the active grammar rule. When either a recognition or a false recognition is made by the engine, your callback routine is called to handle the request. The following shows a section of the recognition handler:

该识别引擎将口语单词与主动语态语法规则进行对比。当此引擎产生一个识别或者错误识别时，将调入你的用于处理此请求的复查程序。下面说明了识别处理程序的一部分。

<PRE id=pre4 style="MARGIN-TOP: 0px" nd="56">void CWebVoice::ExecuteCommand(ISpRecoResult *pPhrase, HWND hWnd)

{

USES_CONVERSION;

SPPHRASE *pElements;

static int ind;

int pos;

if (SUCCEEDED(pPhrase->GetPhrase(&pElements))) {

m_cpRecoCtxt->Pause(NULL); // pause recognition while loading

switch (pElements->Rule.ulId ) {

case RID_Tree:

pos=pElements->pProperties->vValue.ulVal;

ind=pos-1; // store the index into the data array

SetActiveRule(RID_Confirm); // change the active rule

wcscpy(wcs,L"Please confirm: /r/n");

wcscat(wcs,T2W(aKeys[ind].txt));

HandleReply(0,wcs);

break;

case RID_Confirm:

pos=pElements->pProperties->vValue.ulVal;

switch(pos) {

case 1:

HandleConfirm(ind); // expand the tree and navigate to item

SetActiveRule(RID_View); // change the active rule

break;

case 2:

default:SetActiveRule(RID_Tree); HandleReply(MID_Tree,NULL); break;

break;

}

// more cases for other rules

default: SetActiveRule(RID_Tree); HandleReply(RID_Tree,NULL); break;

}

::CoTaskMemFree(pElements);

m_cpRecoCtxt->Resume(NULL);

}

}</PRE>

When a navigation rule is matched, it's property value is stored in the static variable ind and, after confirmation, is passed to the HandleConfirm(ind) funtion （此处原文拼写错误，应为 function ） which uses it to index the data array ( aKeys[ind] ) and retrieve the correct data item. If successful the tree view will be opened to show the selection and the hyperlink will be navigated

当符合导航规则时，其特性值将先被存储在静态变量 ind 中，确认后，再传输给 HandleConfirm(ind) 函数。该函数利用此特性值将数据组 (aKeys[ind]) 编入索引中，并重新得到正确的数据项目。如果成功，打开的树状图将给出选择结果，超链接也将可以被通过。

Points of Interest

重要的几点

Every time I write an ActiveX control or a Web Browser plugin in ATL I have to re-learn how to use wide character strings; and SAPI uses wide character strings exclusively. If your code does not have to run on Win98 then you can just define UNICODE and as long as your strings are defined as TCHAR* , the usual API calls will work fine.

每一次我向 ATI （应用程序接口）写入 ActiveX （微软倡导的 ActiveX 网络化多媒体对象技术）或是网络浏览器插件中，我的不得不重新学习怎样使用宽字符串；并且宽字符串为 SAPI 所专用。如果你的代码并不是必须要在 Win98 环境下运行，只要你的字符串定义为 TCHAR*, 那就值只需定义 UNICODE （统一的字符编码标准 , 采用双字节对字符进行编码），在这种情况下，通常的 API 调用工作良好。

But if discounting Win98 users is not an option then you are forced to convert from multibyte to wide-string whenever you use the Win32 API. Fortunately, ATL has a wonderful set of conversion macros defined in <atlbase.h>. You just place the macro USES_CONVERSION at the beginning of each function that needs to convert strings then use the W2T() or T2W() macros to perform the conversion.

但是如果不能选择减少 Win98 用户的数量，那无论你什么时候使用 Win32 API 时，都必须把多字节转换成宽字符串。幸运的是， ATL 在 <atlbase.h> 中定义了一组相当好的转换宏指令。你只需要把宏指令 USES_CONVERSION 放在每个需要转换字符串的函数的起始位置，再用 W2T() 或者 T2W() 宏指令来完成转换。

I have no doubt that the overhead to these macros is alarming -after all they have to allocate memory, copy the string then release memory in each conversion call. However these macros are so convenient and tidy that I've even started including <atlbase.h> in my MFC programs.

我毫不怀疑的认为，对这些宏指令的总的操作是令人担忧的，因为毕竟必须要分配储存器和复制字符串，在每个转换请求时还要释放储存器。尽管这些宏指令很方便、很精简，以至于我干脆以包含 <atlbase.h> 的头文件来开始我的 MFC （ Microsoft Foundation Class, MS-Visual C++ 的类库）程序。

Another problem I encountered was the need to use owner draw buttons -the standard dialog-box grey does not cut it on a web page. In MFC I would override the WM_CTLCOLOR message and change the background colour there. In ATL I found that I had to make the buttons owner-drawn then handle the WM_DRAWITEM message.

我遇到的另一个问题是，需要使用属主拖曳按钮 - 标准对话框的灰色在网页上无法停止（对这里的翻译我比较费解，需要你的专业知识来校正 ~ ）。在 MFC 中，我可以替换 WM_CTLCOLOR 信息，并在其中改变背景颜色。在 ALT 中，我发现我不得不生成属主拖曳式按钮，再处理 WM_DRAWITEM 信息。

All well and good but then I discovered that I needed both a toggle button and a momentary button and that I now needed to code the required responses myself. It was all great fun but it took some time before I was able to get to the SAPI part of the code.

这样也好，不过我发现触发按钮和瞬时按钮都是必需的，而且现在我还得自己对请求回应进行编码。这倒是非常有趣，但我要花掉很多时间，才能到编码的 SAPI 部分。

The Microsoft Speech SDK 5.1 is a 68 MB download and if you need to package the SAPI runtime modules with your code, you must download the full redistribution package which is 131.58 MB.

微软语音 SDK （ Software Development Kit, 软件开发工具包） 5.1 是一个 68Mb 的下载文件。如果你需要把 SAPI 运行时间模型和你的编码进行打包，你必须下载一个 131.58Mb 的重新分配的完整安装包。

Unfortunately Microsoft does not package the runtime modules by themselves. Either your clients must download the SDK (including the extra 30 MB of developer code and documentation) or you must prepare a runtime module package yourself as a separate download from your application.

不幸的是，微软并没有自己打包运行时间模型。你的客户必须下载 SDK （包括额外的 30Mb 开发者编码和文件），或者你必须自己准备一个运行时间模型，作为你的应用软件的一个单独的下载文件。

Revisions

修正

29 January 2004 -Subclassed picture control to avoid Win98 problems and fixed small script error in the demo
2004年1月29日 – 用来避免Win98问题和演示中固定小脚本错误的子图片控制器

About Geoff Bailey

关于 **

……

（ THE END ）

原文地址 : http://www.codeproject.com/audio/WebVoicePkg.asp