Windows IME(二)_immgetconversionstatus卡死-CSDN博客

Composition String

The composition string is the current text in the composition window. This is the text that the IME converts to final characters. Each composition string consists of one or more clauses, where a clause is the smallest combination of characters that the IME can convert to a final character. To get and set the composition string, call the ImmGetCompositionString andImmSetCompositionString functions.

As the user enters text in the composition window, the IME tracks the status of the composition string. This status includes attribute information, clause information, typing information, and cursor position. You can retrieve the composition status by using the ImmGetCompositionStringfunction.

In the attribute information array, all characters of one clause must have the same attribute. Theattribute information is an array of 8-bit values that specifies the status of characters in the composition string. There is one value for each byte in the string, including one byte each for the lead and second bytes of any double-byte characters in the string. For each value in the array, bits 0 through 3 can be one combination of the following values.

字母组合字符串（composition string）是指字母组合窗口中的当前文本。这是将被IME转换为最终字符的文本。每个字母组合字符串由一个或多个子串组成（clauses），子串是IME能转换成最终字符的最小字符组合。要获取或设置字母组合字符串，调用ImmGetCompositionString 和ImmSetCompositionString函数。

当用户在字母组合窗口中输入文本时，IME会跟踪字母组合字符串的状态。这些状态包括属性信息、子串信息、录入信息和子串的位置。你可以通过使用ImmGetCompositionString函数来检索字母组合的状态。

在属性信息数组中，同一个子串中的所有字母都必须具有相同的属性。属性信息是一个由8位值组成的数组，用于确定字母组合字符串中的字符。字符串中的每个字节都对应有一个值，其中，字符串中每个双字节字符的第一和第二字节也都对应的一个值。数组中的每个值，从第0位到第3位可以是下面各值的组合：

Value（值）	Meaning（含意）
ATTR_INPUT	Character being entered by the user. It is yet to be converted by the IME. 用户输入的字符。它将被IME转换。
ATTR_INPUT_ERROR	Character is an error character and cannot be converted by the IME. For example, some consonants cannot be put together. 字符是一个错误的字符，不能被IME所转换。使用某些辅音字母不能放在一起。
ATTR_TARGET_CONVERTED	Character converted by the IME. The user has selected this character and the IME has converted it. 被IME所转换了的字符。用户已经选择了这个字符并且IME已经将其转换。
ATTR_CONVERTED	A converted character. The IME has already converted this character. 一个已经转换了的字符。IME已经将其转换。
ATTR_TARGET_NOTCONVERTED	Character being converted. The user has selected this character but the IME has not yet converted it. 正在转换中的字符。用户已经选择了这个字符，但是IME还没有转换它。
ATTR_FIXEDCONVERTED	Characters that will not be converted. The IME will not convert these characters anymore. 不能被转换的字符。IME将不能再对其转换。

All other values are reserved. In Japanese, any unconverted character having the ATTR_INPUT attribute is a Hiragana, Katakana, or alphanumeric character. In Korean, this character is a Hangeul character that is not converted by IME yet. In Traditional and Simplified Chinese, each IME may limit its character in some range.

其它值（大概是指8位中的）将被保留。在日文中，任何拥有ATTR_INPUT属性而不能被转换的字符都是一个平假名、片假名是数字字母。在韩文中，这样的字符是一个韩文字符，它也不能被IME所转换。在传统（也就是繁体）和简体中文中，各个IME（大概是指各个输入法）会以自己的范围内限制它的字符。

The clause information is an array of 32-bit values that specifies the positions of the clauses in the composition string. There is one value for each clause and a final value that specifies the length of the full string. Each value in the array specifies the offset, in bytes, from the beginning of the string to the clause. The first value is always 0 because the first clause always starts at the beginning of the string. For example, if a string has two clauses, the clause information has three values: the first value is 0, the second value is the offset of the second clause, and the third value is the length of the string. For Unicode, the position of a clause is the position counted in Unicode characters, and the length of a string is the size in Unicode characters.

子串信息是一个32位值的数组，用于确定子串在字母组合字符串中的位置。每个子串对应一个值，最后一个值确定整个字母组合字符串的长度。数组中的每个值在字节级别上（以字节为单位）确定了子串由字符串算起的偏移量。第一个值总是0，因为第一个子串总是从字符串的起点开始。例如，如果字符串有两个子串，子串消息（数组）就会有三个值：第一个值是0，第二个值是第二个子串的偏移量，第三个值是字符串的长度。对于Unicode，子串的位置是用Unicode字符计算出的位置，并且字符串的长度也是以Unicode字符计数。

The typing information is a null-terminated character string representing the characters entered at the keyboard.

The cursor position is a value indicating the position of the cursor relative to the characters in the composition string. The value is the offset, in bytes, from the beginning of the string. If this value is 0, the cursor is immediately before the first character in the string. If the value is equal to the length of the string, the cursor is immediately after the last character. If –1, the cursor is not present. For Unicode, both position and length are measured in Unicode characters.

录入信息是一个无终结字符（null-terminated character）字符串，代表由键盘输入的字符。光标位置（cursor position）是一个值，它指出光标相对于字母组合字符串中字符的位置。此值是个以字节计算的偏移量，从字符串的起点算起。如果值与字符串的长度相等，光标恰好处在最后一个字符的后面。如果值为-1，意味着没有光标（光标不显示）。对于Unicode，位置和长度都是以Unicode字符来度量的。

You can set the composition string or elements of the composition status by using theImmSetCompositionString function. To ensure that the composition window updates its appearance based on these changes, the function allows you to send a notification message to the window. Applications that set a combination of composition status elements typically set the fNotifyparameter to FALSE for all but the last call to this function so that only one notification message is generated for the composition window.

Finally, the edit control supports two messages for changing the IME's handling of composition strings. For more information, see EM_GETIMESTATUS and EM_SETIMESTATUS. For more information on the edit control, see Edit Controls.

你可以使用ImmSetCompositionString函数来设置字母组合字符串或者其中元素的属性。为了确保字母组合窗口已经根据这些改变更新了它的外观显示，函数允许你向窗口发送一个通知消息。设定字母组合字符串状态元素组合的应用程序，默认情况下会把所有对此函数调用中的fNotify参数设置为FASLE，除了最后一个调用，所以对字母组合窗口只会生成一个通知消息。

最后，编辑控件还支持两个IME的字母组合字符串处理消息。更多信息，参见EM_GETIMESTATUS和 EM_SETIMESTATUS。更多有关编辑控件的信息，参见Edit Controls。

Candidate Lists

A candidate list is a CANDIDATELIST structure consisting of an array of strings that specifies the characters or character strings that the user may choose from. You can retrieve the candidate lists by using the ImmGetCandidateListCount and ImmGetCandidateList functions.

备选列表是一个CANDIDATELIST结构，这个结构由一个列出用户可从中选取字符（例如汉字）或者字符串（例如词）的字符串数组组成。你可以用ImmGetCandidateListCount 和ImmGetCandidateList函数来检索备选列表中。

Hot Keys

Hot keys give the user a way to quickly change the input mode of the IME or to switch to another IME. Although applications cannot add hot keys to the system, they can initiate the same action as a hot key by using the ImmSimulateHotKey function.

The HexToUnicode IME also permits conversion between hexadecimal and Unicode characters. For an explanation, see HexToUnicode IME.

快捷键（热键）为用户提供了快速改变IME输入状态（例如中/文切换）或选择别的IME（例如从拼音换成五笔）的途径。尽管应用程序不能给系统添加快捷键，但是它可以通过使用ImmSimulateHotKey函数模仿快捷键效果，来启动相同的操作任务。

HexToUnicode　IME还允许十六进制字符与Unicode字符之间的转换。解释请参见HexToUnicode IME。

HexToUnicode IME

Rich Edit 3.0 supports the HexToUnicode IME, which allows a user to convert between hexadecimal and Unicode characters by using hot keys in one of two ways.

In the first method, the user types the character code in hexadecimal and then types ALT+X. The IME replaces the hexadecimal digits preceding the insertion point with the Unicode character. If the current font does not support the character code, an appropriate font is chosen that does support it. To convert from Unicode to hexadecimal, type SHIFT+ALT+X. This replaces the Unicode character that precedes the insertion point with the hexadecimal digits. In particular, this allows you to determine the character that is indicated by a "missing glyph" indicator. If the hexadecimal character code immediately follows some legitimate (noncharacter) hexadecimal characters, select the specific digits that you want to convert before typing ALT+X. A problem with this first method is that ALT+X is sometimes used as a key combination for the exit command (that is, eXit). For example, in Microsoft Office, this only happens as an option of the File menu.

The second method involves the number pad. Here the user types ALT+NumPad numbers (with values greater than 255) to enter Unicode characters using decimal values. This method is not as useful as the first method because you cannot see what hexadecimal digits you typed. Also, you cannot correct them except by reentering them all again.

Rich Edit 3.0支持HexToUnicode IME，它允许用户使用快捷键通过一两个途径在十六进制字符和Unicode字符之间转换。

第一个方法中，用户在十六进制模式下输入字符然后输入Alt+X键。IME会用Unicode字符取代插入点前的十六进制编码。如果当前字体不支持此字符的编码，则会选用一个能够支持它的合适字体。要将Unicode字符转换成十六进制字符，输入Shift+Alt+X键即可。这一操作会将插入点前的Unicode字符取代

为十六进制编码。特殊情况下，这一操作允许你自己决定由“缺失字形”指示符所指示的字符。如果十六进制字符代码后紧跟着某些十六进制非字符的代码，那么在输入Alt+X之前应该选择一个你想转换成的特殊字符。第一个方法的问题在于Alt+X往往被当做退出程序的组合键（也就是，eXit）来使用。例如，在Microsoft Office中，它将执行与File菜单中选项（退出选项）相同的功能。

第二个方法涉及数字键盘。用户可以通过Alt+数字键盘数字（数字必须大于255）的方法用十进制值来输入Unicode字符。因为你看不见你输入的十六进制编码值，这个方法比起第一个方法就稍有逊色了。而且，对于更正而言，除了重新输入你别无选择。

2、输入法编辑器应用
2．1 处理WM-IME-COMPOSITION消息
处理WM_IME_COMPOSITION消息的程序测试lParam参数的位数和调用ImmGetCompositionString函数检索指示串或数据。下例检查结果串，为串指派足够的内存，并从IME中检索结果串。
HIMC hIMC;
HWND hWnd;
DWORD dwSize;
HGLOBAL hstr;
LPSTR lpstr;

case WM_IME_COMPOSITION:
    if (lParam & GCS_RESULTSTR)
    {
        hIMC = ImmGetContext(hWnd);

If (!hIMC)
MyError(ERROR_NULLCONTEXT);

// Get the size of the result string.
dwSize = ImmGetCompositionString(hIMC, GCS_RESULTSTR, NULL, 0);

        // increase buffer size for NULL terminator,
        //   maybe it is in UNICODE
        dwSize += sizeof(WCHAR);

        hstr = GlobalAlloc(GHND,dwSize);
        if (hstr == NULL)
             MyError(ERROR_GLOBALALLOC);

        lpstr = GlobalLock(hstr);
        if (lpstr == NULL)
             MyError(ERROR_GLOBALLOCK);

        // Get the result strings that is generated by IME into lpstr.
        ImmGetCompositionString(hIMC, GCS_RESULTSTR, lpstr, dwSize);
        ImmReleaseContext(hWnd, hIMC);

// add this string into text buffer of application

        GlobalUnlock(hstr);
        GlobalFree(hstr);
    }
2．2 输入法编辑器和Unicode
微软® Windows NT ®/2000 和微软® Windows® 98两种平台均为IME支持Unicode接口，当然还有从Windows95起，就支持的ANSI接口。Windows98支持除ImmIsUIMessage以外的所有Unicode函数。此外Windows98中的所有消息均是基于ANSI的。既然Windows98不支持Unicode消息，程序可以利用ImmGetCompositionString从Win98中基于IME的Unicode中检索Unicode字符。
有两个问题涉及Unicode句柄和IME。一个是IME例行程序的Unicode版本返回缓存字节的大小，而不是16位Unicode字符；另一个是IME通常在WM_CHAR 和WM_IME_CHAR消息中返回Unicode字符（不是DBCS）。
用RegisterClassW使WM_CHAR和 WM_IME_CHAR消息在wParam参数中返回Unicode字符而不是DBCS字符。此法仅在Windows NT/2000中可用。在win95/98中不可用。
2．3  复原IME
在Windows 98/2000中，IME具有复原的新特征。通常情况下，IME仅限于由键入来决定备择列表。复原允许IME依照其所在的句子（上下文）来决定备择的字符（或仅一个备择字符）。复原有三种类型：简单，正常和高级。
当用户在文件中发现录入错误时，复原是很有用的。此时，用户选出错误，并在菜单中选择复原，则IME用上下文来确定最佳替代。ImmSetCompositionString可用来支持复原的使用。
2．4 开发IME-aware多线程应用程序。
当前的输入法管理器（IMM）结构不支持访问IMM句柄的异步工具。
Windows 2000的IMM包含线程确认检查，用来确认调用的线程是否是指定输入法上下文句柄（hIMc）或视窗句柄（hWnd）的创建者。若不是，则函数调用失败并从GetLastError函数中返回ERROR_INVALID_ACCESS
这一过程须要开发者遵循以下原则：
⑴一个线程不应该同由其它线程创建的输入上下文。
⑵一个线程不应连接输入上下文和其它线程创建的视窗，或反之。
2．5、Windows 2000中的IME
Windows 2000为任何地方的语言版本均提供IME的所有特性。因此，可以在Windows 2000的任何版本中安装和使用IME，而不仅是仅为亚洲语言设计的Windows 2000。
然而，仅当用户在Win 2000上安装了亚洲语言包，IMM方可被激活。基于IME的程序，用SM_IMMENABLED调用GetSystemMetrics函数来判定IMM是否被激活。

在应用程序中精确切换输入法

如果在你的程序里需要输入很多的内容，比如各类单据，如果在进入每一个录入框的时候都能自动把输入法切换到合适的状态将会是一个很酷的特性，相比炫丽的界面而言打字到手抽筋的录入人员们对此会更加感兴趣。在winform中切换输入法是很简单的事情：

            foreach (InputLanguage iL in InputLanguage.InstalledInputLanguages)
            {
                if (iL.LayoutName == "智能ABC")
                {
                    InputLanguage.CurrentInputLanguage = iL;
                    break;
                }
            }

这样子就能很轻松的吧输入法切换到智能ABC了。

但是这样子的效果是不完美的，这个方法不能指定IME的状态，也就是IME的转换状态。这个概念是只有远东地区的windows才存在的状态，因为对于英语国家来说输入根本不存在输入法，而法语，俄语等拉丁语系的拼音文字的语言都只需要简单的修改键映射关系就行了，只有受中文影响的东亚地区有输入法的概念，比如日文和朝鲜文，为了方便输入，这些IME中都有很多状态，比如微软日文输入法：

日文输入包含了全角半角平假名，全角半角片假名等录入的状态，还包括了：

普通，名字，对话和不转换四种假名到当用汉字的转换模式

汉语的输入法简单一些，不存在对汉字的转换，不过也存在为了中英交替录入（牛逼哄哄的日本人喜欢用片假名替换英文所以不存在交替录入）而存在的中英录入模式转换，中英标点符号装换，还有全角半角的转换：

如果是用的很古老的智能ABC的话，那么还存在双打和标准模式的切换。

对于这些五花八门的输入法中的输入模式，如果在选择了确定要输入法，同时也要确定要唯一定位到输入的模式，那么 InputLanguage 类的功能就捉襟见肘了。

在windows的文字服务IMM中对IME提供了ConvertionStatus的接口来确定输入法的工作模式，在Win32API中就是

ImmGetConversionStatus和ImmSetConversionStatus

这两个函数都有三个参数，一个是输入法IME的句柄，一个是mode，一个是sentence，最重要的就是mode和sentence这两个参数了，他们就是确定输入法状态的数据。经过测试发现，每个输入法的值都不一样，所以看MSDN去解析这两个int变量所对应的枚举值有哪些意义完全没必要，太复杂了，而我们只需要精确的切换到某个模式，只需要记录下这些模式下mode和sentence的值就行了。

下面是中文输入法的状态和mode值的对应关系表

双打模式（包括单双混合，比如微软输入法）

输入法状态	mode值
中文输入-半角-中文符号	-2147482623
中文输入-全角-中文符号	-2147482615
中文输入-半角-英文符号	-2147483647
中文输入-全角-英文符号	-2147483639
英文输入-半角-中文符号	-2147482624
英文输入-全角-中文符号	-2147482616
英文输入-半角-英文符号	-2147483648
英文输入-全角-英文符号	-2147483640

标准模式（全拼模式，比如智能ABC的标准模式）

输入法状态	mode值
中文输入-半角-中文符号	1025
中文输入-全角-中文符号	1033
中文输入-半角-英文符号	1
中文输入-全角-英文符号	9
英文输入-半角-中文符号	1024
英文输入-全角-中文符号	1032
英文输入-半角-英文符号	0
英文输入-全角-英文符号	8

由于不同的输入法对于模式的支持不同，比如搜狗支持的是标准模式，但是由于本身的功能又可以双打，但是设置模式却要用标准模式的值，而微软拼音是双打模式，而和同在双打模式智能ABC的sentence值又不一样，所以还是要根据具体的输入法测试后决定用什么数值来实现转换。

附1：

转换的代码：

首先注册Win32API的方法

        [DllImport("imm32.dll")]
        public static extern IntPtr ImmGetContext(IntPtr hWnd);

        [DllImport("imm32.dll")]
        public static extern bool ImmGetConversionStatus(IntPtr hIMC,
            ref int conversion, ref int sentence);

        [DllImport("imm32.dll")]
        public static extern bool ImmSetConversionStatus(IntPtr hIMC, int conversion, int sentence);

            foreach (InputLanguage iL in InputLanguage.InstalledInputLanguages)
            {
                if (iL.LayoutName == "中文（简体）-搜狗拼音输入法")
                {
                    InputLanguage.CurrentInputLanguage = iL;
                    break;
                }
            }
            IntPtr prt = ImmGetContext(this.Handle);
            int iMode = 1033;
            int iSentence = 0;
            if (!ImmSetConversionStatus(prt, iMode, iSentence))
            {
                MessageBox.Show("change error");
            }