今天调试程序发现,微软语音识别引擎的命令模式并不是支持完成的GBK字符集,而仅支持GB2312(准确说是其一个子集,很多符号都不支持)!一旦Xml文件中包含不可识别字符时,LoadCmdFromFile就会失败。因而在添加命令到Xml中前需要对命令文本进行过滤。写了一个过滤函数,代码如下:
/*
* 1.字符编码,共682个:
1)01区(94个):高位A1,低位A1-FE
2)02区(72个):高位A2,低位B1-E2,E5-EE,F1-FC
3)03区(94个):高位A3,低位A1-FE
4)04区(83个):高位A4,低位A1-F3
5)05区(86个):高位A5,低位A1-F6
6)06区(48个):高位A6,低位A1-B8,C1-D8
7)07区(66个):高位A7,低位A1-C1,D1-F1
8)08区(63个):高位A8,低位A1-BA,C5-E9
9)09区(76个):高位A9,低位A4-EF
2.汉字编码,共6763个:
高位B0-F7,低位A1-FE,剔除以下编码:
D7FA-D7FE
*/
VOID CSREngine::GBK2GB2312(LPSTR lpszChineseStr)
{
if(lpszChineseStr)
{
UINT32 iWritePos = 0;
for(UINT32 iScanPos=0; lpszChineseStr[iScanPos]; ++iScanPos)
{
if((UCHAR)lpszChineseStr[iScanPos] >= 0x80)
{
UINT8 iChHigh = lpszChineseStr[iScanPos];
UINT8 iChLow = lpszChineseStr[iScanPos+1];
if(/*iChHigh==0xA1 && iChLow>=0xA1 && iChLow<=0xFE
|| iChHigh==0xA2 && iChLow>=0xB1 && iChLow<=0xE2
|| iChHigh==0xA2 && iChLow>=0xE5 && iChLow<=0xEE
|| iChHigh==0xA2 && iChLow>=0xF1 && iChLow<=0xFC
|| iChHigh==0xA3 && iChLow>=0xA1 && iChLow<=0xFE
|| iChHigh==0xA4 && iChLow>=0xA1 && iChLow<=0xF3
|| iChHigh==0xA5 && iChLow>=0xA1 && iChLow<=0xF6
|| iChHigh==0xA6 && iChLow>=0xA1 && iChLow<=0xB8
|| iChHigh==0xA6 && iChLow>=0xC1 && iChLow<=0xD8
|| iChHigh==0xA7 && iChLow>=0xA1 && iChLow<=0xC1
|| iChHigh==0xA7 && iChLow>=0xD1 && iChLow<=0xF1
|| iChHigh==0xA8 && iChLow>=0xA1 && iChLow<=0xBA
|| iChHigh==0xA8 && iChLow>=0xC5 && iChLow<=0xE9
|| iChHigh==0xA9 && iChLow>=0xA4 && iChLow<=0xEF
|| */iChHigh>=0xB0 && iChHigh<=0xF7 && iChLow>=0xA1 && iChLow<=0xFE
&& !(iChHigh==0xD7 && iChLow>=0xFA && iChLow<=0xFE))
{
if(iWritePos != iScanPos)
{
lpszChineseStr[iWritePos] = iChHigh;
lpszChineseStr[iWritePos+1] = iChLow;
}
iWritePos += 2;
}
++iScanPos;
}
}
lpszChineseStr[iWritePos] = '/0';
}
}
代码中将所有的字符都去掉了。其中某些引擎可以识别,但是某些无法识别。因为符号无法读出来,索性就全部去除了。仅测试,过滤后的命令文本可以很好的工作。