Composition String
The composition string is the current text in the composition window. This is the text that the IME converts to final characters. Each composition string consists of one or more clauses, where a clause is the smallest combination of characters that the IME can convert to a final character. To get and set the composition string, call the ImmGetCompositionString andImmSetCompositionString functions.
As the user enters text in the composition window, the IME tracks the status of the composition string. This status includes attribute information, clause information, typing information, and cursor position. You can retrieve the composition status by using the ImmGetCompositionStringfunction.
In the attribute information array, all characters of one clause must have the same attribute. Theattribute information is an array of 8-bit values that specifies the status of characters in the composition string. There is one value for each byte in the string, including one byte each for the lead and second bytes of any double-byte characters in the string. For each value in the array, bits 0 through 3 can be one combination of the following values.
Value(值) | Meaning(含意) |
ATTR_INPUT | Character being entered by the user. It is yet to be converted by the IME. 用户输入的字符。它将被IME转换。 |
ATTR_INPUT_ERROR | Character is an error character and cannot be converted by the IME. For example, some consonants cannot be put together. 字符是一个错误的字符,不能被IME所转换。使用某些辅音字母不能放在一起。 |
ATTR_TARGET_CONVERTED | Character converted by the IME. The user has selected this character and the IME has converted it. 被IME所转换了的字符。用户已经选择了这个字符并且IME已经将其转换。 |
ATTR_CONVERTED | A converted character. The IME has already converted this character. 一个已经转换了的字符。IME已经将其转换。 |
ATTR_TARGET_NOTCONVERTED | Character being converted. The user has selected this character but the IME has not yet converted it. 正在转换中的字符。用户已经选择了这个字符,但是IME还没有转换它。 |
ATTR_FIXEDCONVERTED | Characters that will not be converted. The IME will not convert these characters anymore. 不能被转换的字符。IME将不能再对其转换。 |
All other values are reserved. In Japanese, any unconverted character having the ATTR_INPUT attribute is a Hiragana, Katakana, or alphanumeric character. In Korean, this character is a Hangeul character that is not converted by IME yet. In Traditional and Simplified Chinese, each IME may limit its character in some range.
The clause information is an array of 32-bit values that specifies the positions of the clauses in the composition string. There is one value for each clause and a final value that specifies the length of the full string. Each value in the array specifies the offset, in bytes, from the beginning of the string to the clause. The first value is always 0 because the first clause always starts at the beginning of the string. For example, if a string has two clauses, the clause information has three values: the first value is 0, the second value is the offset of the second clause, and the third value is the length of the string. For Unicode, the position of a clause is the position counted in Unicode characters, and the length of a string is the size in Unicode characters.
The typing information is a null-terminated character string representing the characters entered at the keyboard.
The cursor position is a value indicating the position of the cursor relative to the characters in the composition string. The value is the offset, in bytes, from the beginning of the string. If this value is 0, the cursor is immediately before the first character in the string. If the value is equal to the length of the string, the cursor is immediately after the last character. If –1, the cursor is not present. For Unicode, both position and length are measured in Unicode characters.
You can set the composition string or elements of the composition status by using theImmSetCompositionString function. To ensure that the composition window updates its appearance based on these changes, the function allows you to send a notification message to the window. Applications that set a combination of composition status elements typically set the fNotifyparameter to FALSE for all but the last call to this function so that only one notification message is generated for the composition window.
Finally, the edit control supports two messages for changing the IME's handling of composition strings. For more information, see EM_GETIMESTATUS and EM_SETIMESTATUS. For more information on the edit control, see Edit Controls.
Candidate Lists
A candidate list is a CANDIDATELIST structure consisting of an array of strings that specifies the characters or character strings that the user may choose from. You can retrieve the candidate lists by using the ImmGetCandidateListCount
Hot keys give the user a way to quickly change the input mode of the IME or to switch to another IME. Although applications cannot add hot keys to the system, they can initiate the same action as a hot key by using the ImmSimulateHotKey function.
The HexToUnicode IME also permits conversion between hexadecimal and Unicode characters. For an explanation, see HexToUnicode IME.
Rich Edit 3.0 supports the HexToUnicode IME, which allows a user to convert between hexadecimal and Unicode characters by using hot keys in one of two ways.
In the first method, the user types the character code in hexadecimal and then types ALT+X. The IME replaces the hexadecimal digits preceding the insertion point with the Unicode character. If the current font does not support the character code, an appropriate font is chosen that does support it. To convert from Unicode to hexadecimal, type SHIFT+ALT+X. This replaces the Unicode character that precedes the insertion point with the hexadecimal digits. In particular, this allows you to determine the character that is indicated by a "missing glyph" indicator. If the hexadecimal character code immediately follows some legitimate (noncharacter) hexadecimal characters, select the specific digits that you want to convert before typing ALT+X. A problem with this first method is that ALT+X is sometimes used as a key combination for the exit command (that is, eXit). For example, in Microsoft Office, this only happens as an option of the File menu.
The second method involves the number pad. Here the user types ALT+NumPad numbers (with values greater than 255) to enter Unicode characters using decimal values. This method is not as useful as the first method because you cannot see what hexadecimal digits you typed. Also, you cannot correct them except by reentering them all again.
为十六进制编码。特殊情况下,这一操作允许你自己决定由“缺失字形”指示符所指示的字符。如果十六进制字符代码后紧跟着某些十六进制非字符的代码,那么在输入Alt+X之前应该选择一个你想转换成的特殊字符。第一个方法的问题在于Alt+X往往被当做退出程序的组合键(也就是,eXit)来使用。例如,在Microsoft Office中,它将执行与File菜单中选项(退出选项)相同的功能。
2、输入法编辑器应用
2.1 处理WM-IME-COMPOSITION消息
处理WM_IME_COMPOSITION消息的程序测试lParam参数的位数和调用ImmGetCompositionString函数检索指示串或数据。下例检查结果串,为串指派足够的内存,并从IME中检索结果串。
HIMC hIMC;
HWND hWnd;
DWORD dwSize;
HGLOBAL hstr;
LPSTR lpstr;
case WM_IME_COMPOSITION:
2.2 输入法编辑器和Unicode
微软® Windows NT ®/2000 和微软® Windows® 98两种平台均为IME支持Unicode接口,当然还有从Windows95起,就支持的ANSI接口。Windows98支持除ImmIsUIMessage以外的所有Unicode函数。此外Windows98中的所有消息均是基于ANSI的。既然Windows98不支持Unicode消息,程序可以利用ImmGetCompositionString从Win98中基于IME的Unicode中检索Unicode字符。
有两个问题涉及Unicode句柄和IME。一个是IME例行程序的Unicode版本返回缓存字节的大小,而不是16位Unicode字符;另一个是IME通常在WM_CHAR 和WM_IME_CHAR消息中返回Unicode字符(不是DBCS)。
用RegisterClassW使WM_CHAR和 WM_IME_CHAR消息在wParam参数中返回Unicode字符而不是DBCS字符。此法仅在Windows NT/2000中可用。在win95/98中不可用。
2.3
在Windows 98/2000中,IME具有复原的新特征。通常情况下,IME仅限于由键入来决定备择列表。复原允许IME依照其所在的句子(上下文)来决定备择的字符(或仅一个备择字符)。复原有三种类型:简单,正常和高级。
当用户在文件中发现录入错误时,复原是很有用的。此时,用户选出错误,并在菜单中选择复原,则IME用上下文来确定最佳替代。ImmSetCompositionString可用来支持复原的使用。
2.4 开发IME-aware多线程应用程序。
当前的输入法管理器(IMM)结构不支持访问IMM句柄的异步工具。
Windows 2000的IMM包含线程确认检查,用来确认调用的线程是否是指定输入法上下文句柄(hIMc)或视窗句柄(hWnd)的创建者。若不是,则函数调用失败并从GetLastError函数中返回ERROR_INVALID_ACCESS
这一过程须要开发者遵循以下原则:
⑴一个线程不应该同由其它线程创建的输入上下文。
⑵一个线程不应连接输入上下文和其它线程创建的视窗,或反之。
2.5、Windows 2000中的IME
Windows 2000为任何地方的语言版本均提供IME的所有特性。因此,可以在Windows 2000的任何版本中安装和使用IME,而不仅是仅为亚洲语言设计的Windows 2000。
然而,仅当用户在Win 2000上安装了亚洲语言包,IMM方可被激活。基于IME的程序, 用SM_IMMENABLED调用GetSystemMetrics函数来判定IMM是否被激活。
在应用程序中精确切换输入法
如果在你的程序里需要输入很多的内容,比如各类单据,如果在进入每一个录入框的时候都能自动把输入法切换到合适的状态将会是一个很酷的特性,相比炫丽的界面而言打字到手抽筋的录入人员们对此会更加感兴趣。在winform中切换输入法是很简单的事情:
foreach (InputLanguage iL in InputLanguage.InstalledInputLanguages) { if (iL.LayoutName == "智能ABC") { InputLanguage.CurrentInputLanguage = iL; break; } }
这样子就能很轻松的吧输入法切换到智能ABC了。
但是这样子的效果是不完美的,这个方法不能指定IME的状态,也就是IME的转换状态。这个概念是只有远东地区的windows才存在的状态,因为对于英语国家来说输入根本不存在输入法,而法语,俄语等拉丁语系的拼音文字的语言都只需要简单的修改键映射关系就行了,只有受中文影响的东亚地区有输入法的概念,比如日文和朝鲜文,为了方便输入,这些IME中都有很多状态,比如微软日文输入法:
日文输入包含了全角半角平假名,全角半角片假名等录入的状态,还包括了:
普通,名字,对话和不转换四种假名到当用汉字的转换模式
汉语的输入法简单一些,不存在对汉字的转换,不过也存在为了中英交替录入(牛逼哄哄的日本人喜欢用片假名替换英文所以不存在交替录入)而存在的中英录入模式转换,中英标点符号装换,还有全角半角的转换:
如果是用的很古老的智能ABC的话,那么还存在双打和标准模式的切换。
对于这些五花八门的输入法中的输入模式,如果在选择了确定要输入法,同时也要确定要唯一定位到输入的模式,那么 InputLanguage 类的功能就捉襟见肘了。
在windows的文字服务IMM中对IME提供了ConvertionStatus的接口来确定输入法的工作模式,在Win32API中就是
ImmGetConversionStatus和ImmSetConversionStatus
这两个函数都有三个参数,一个是输入法IME的句柄,一个是mode,一个是sentence,最重要的就是mode和sentence这两个参数了,他们就是确定输入法状态的数据。经过测试发现,每个输入法的值都不一样,所以看MSDN去解析这两个int变量所对应的枚举值有哪些意义完全没必要,太复杂了,而我们只需要精确的切换到某个模式,只需要记录下这些模式下mode和sentence的值就行了。
下面是中文输入法的状态和mode值的对应关系表
双打模式(包括单双混合,比如微软输入法)
输入法状态 | mode值 |
中文输入-半角-中文符号 | -2147482623 |
中文输入-全角-中文符号 | -2147482615 |
中文输入-半角-英文符号 | -2147483647 |
中文输入-全角-英文符号 | -2147483639 |
英文输入-半角-中文符号 | -2147482624 |
英文输入-全角-中文符号 | -2147482616 |
英文输入-半角-英文符号 | -2147483648 |
英文输入-全角-英文符号 | -2147483640 |
标准模式(全拼模式,比如智能ABC的标准模式)
输入法状态 | mode值 |
中文输入-半角-中文符号 | 1025 |
中文输入-全角-中文符号 | 1033 |
中文输入-半角-英文符号 | 1 |
中文输入-全角-英文符号 | 9 |
英文输入-半角-中文符号 | 1024 |
英文输入-全角-中文符号 | 1032 |
英文输入-半角-英文符号 | 0 |
英文输入-全角-英文符号 | 8 |
由于不同的输入法对于模式的支持不同,比如搜狗支持的是标准模式,但是由于本身的功能又可以双打,但是设置模式却要用标准模式的值,而微软拼音是双打模式,而和同在双打模式智能ABC的sentence值又不一样,所以还是要根据具体的输入法测试后决定用什么数值来实现转换。
附1:
转换的代码:
首先注册Win32API的方法
[DllImport("imm32.dll")] public static extern IntPtr ImmGetContext(IntPtr hWnd); [DllImport("imm32.dll")] public static extern bool ImmGetConversionStatus(IntPtr hIMC, ref int conversion, ref int sentence); [DllImport("imm32.dll")] public static extern bool ImmSetConversionStatus(IntPtr hIMC, int conversion, int sentence);
foreach (InputLanguage iL in InputLanguage.InstalledInputLanguages) { if (iL.LayoutName == "中文(简体)-搜狗拼音输入法") { InputLanguage.CurrentInputLanguage = iL; break; } } IntPtr prt = ImmGetContext(this.Handle); int iMode = 1033; int iSentence = 0; if (!ImmSetConversionStatus(prt, iMode, iSentence)) { MessageBox.Show("change error"); }
这样就能把输入法确定为搜狗拼音,且为全角的数字符号和英文,且标点符号为中文标点。
附2:
日文输入法的值
日文输入法的假名和全半角模式由mode值控制
模式 | mode 值 |
DirectInput | 25 |
Hiragana | 25 |
Full Width Katakana | 27 |
Full Width Alphanumeric | 24 |
Half Width Katakana | 19 |
Half Width Alphanumeric | 16 |
当用汉字的转化模式由 sentence 的值控制
模式 | 值 |
般 | 8 |
名 | 1 |
话 | 16 |
无 | 0 |