我学习Windows核心编程之二 ANSI/Unicode字符和字符串

最新推荐文章于 2022-08-08 22:08:36 发布

chinawash

最新推荐文章于 2022-08-08 22:08:36 发布

阅读量2.2k

点赞数 1

分类专栏： VC++(.NET)/C/C++ 文章标签： windows 编程 string buffer c performance

本文链接：https://blog.csdn.net/chinawash/article/details/655183

版权

VC++(.NET)/C/C++ 专栏收录该内容

64 篇文章 0 订阅

订阅专栏

TChar.h是String.h的修改，用于创建ANSI/Unicode通用字符串。

Unicode字符串的每个字符都是16位的。

Win9x只支持ANSI；Win2000/XP/2003支持ANSI/Unicode；WinCE只支持Unicode
附：有部分Unicode函数也可以在Win9X中使用，但可能会出现意想不到错误。

wchar_t是Unicode字符的数据类型。

所有的Unicode函数均以wcs开头，ANSI函数均以str开头；ANSI C规定C运行期库支持ANSI和Unicode
                                ANSI                                                                             Unicode
      char *strcat(char *, const char *)                          wchar_t *wcscat(wchar_t *, const wchar_t *)
      char *strchr(const char * , int)                                  wchar_t   *wcschr(const wchar_t * , int)
      int strcmp(const char *,   const char *)               int   wcscmp(const wchar_t  *, const wchar_t *)
      char *strcpy(char *, const char *)                          wchar_t *wcscpy(wchar_t   *, const wchar_t   *)
      size_t strlen(const char *)                                       wchar_t   wcslen(const wchar_t *)

L" wash " : 用于将ANSI字符串转换为Unicode字符串；
_TEXT(" wash ")根据是否定义Unicode或_Unicode进行转换。
附：_Unicode用于C运行库；Unicode用于Windows头文件。

ANSI/Unicode通用数据类型
                       Both（ANSI/Unicode）                   ANSI                      Unicode
                             LPCTSTR                               LPCSTR                  LPCWSTR
                              LPTSTR                                   LPSTR                     LPWSTR
                              PCTSTR                                  PCSTR                     PCWSTR
                              PTSTR                                     PSTR                        PWSTR
                              TBYTE(TCHAR)                     CHAR                        WCHAR

在设计dll时最好提供ANSI和Unicode函数，ANSI函数只用于分配内存，将字符转换为Unicode字符，然后调用Unicode函数。

最好使用操作系统函数，少使用或不实用C运行期函数
       eg：操作系统字符串函数（shlWApi.h）
               StrCat(), StrChr(), StrCmp(), StrCpy()等
               注意它们区分大小写，也区分ANSI和Unicode版本
       附：ANSI版函数在原函数后加大写字母A
               Unicode函数在原函数后加大写字母W

成为符合ANSI和Unicode的函数
       • 将文本串视为字符数组，而不是c h a r s数组或字节数组。
      • 将通用数据类型（如T C H A R和P T S T R）用于文本字符和字符串。
      • 将显式数据类型（如B Y T E和P B Y T E）用于字节、字节指针和数据缓存。
       • 将T E X T宏用于原义字符和字符串。
       • 修改字符串运算问题。
         如：sizeof(szBuffer) -> sizeof(szBuffer) / sizeof(TCHAR)
                 malloc(charNum) -> malloc(charNum * sizeof(TCHAR))

对Unicode字符操作的函数还有：（也有ANSI和Unicode版本）
lstrcat() , lstrcmp() / lstrcmpi()[ 它们在内部调用CompareString() ], lstrcpy(), lstrlen()
这些是作为宏实现的。

         C运行期函数                                      windows函数
                 tolower()                                 PTSTR CharLower(PTSTR pszString)
                 toupper()                                PTSTR CharUpper(PTSTR pszString)
                 isalpha()                                 BOOL IsCharAlpha(TCHAR ch)
                                                                  BOOL ISCharAlphaNumeric(TCHAR ch)
                 islower()                                 BOOL IsCharLower(TCHAR ch)
                 isupper()                                BOOL  IsCharUpper(TCHAR ch)
                 print()                                      wsprintf()
      转换Buffer：DWORD CharLowerBuffer(PTSTR pszString , DWORD cchString)
                             DWORD CharUpperBuffer(PTSTR pszString , DWORD cchString)
      也可转换单个字符，如：TCHAR cLowerCaseChar = CharLower((PTSTR)szString[0])

确定字符是ANSI或Unicode
       BOOL IsTextUnicode(
                    const VOID * pBuffer, //input buffer to be examined
                    int cb,                              //size of input buffer
                    LPINT lpi                        //options
       )
      附：此函数在Win9x系统中，没有实现代码，始终返回FALSE

Unicode与ANSI之间的转换
       char szA[40];
       wchar szW[40];
       // Normal sprintf : all string are ANSI
       sprintf( szA , " %s " , " ANSI str ");
       // Convert Unicode string to ANSI
       sprintf( szA, " %S " , L" Unicode str ");
       // Normal swprintf : all string are unicode
       swprinf( szW , "%s" , L" Unicode str ");
       // Convert ANSI String to Unicode
       swprinf( szW, L"%S" , "ANSI str");

       int MultiByteToWideChar(
             UINT uCodePage,                  //code page, 0
             DWORD dwFlags,                  //character-type options, 0
             PCSTR pMultiByte,                 //source string Addr
             int cchMultiByte,                      //source string byte length
             PWSTR pWideCharStr,         //Dest string Addr
             int cchWideChar                     //Dest string char Nums
        )
       u C o d e P a g e参数用于标识一个与多字节字符串相关的代码页号。d w F l a g s参数用于设定另一个控件，它可以用重音符号之类的区分标记来影响字符。这些标志通常并不使用，在d w F l a g s参数中传递0。p M u l t i B y t e S t r参数用于设定要转换的字符串， c c h M u l t i B y t e参数用于指明该字符串的长度（按字节计算）。如果为c c h M u l t i B y t e参数传递- 1，那么该函数用于确定源字符串的长度。转换后产生的U n i c o d e版本字符串将被写入内存中的缓存，其地址由p Wi d e C h a r S t r参数指定。必须在c c h Wi d e C h a r参数中设定该缓存的最大值（以字符为计量单位）。如果调用M u l t i B y t e To Wi d e C h a r，给c c h Wi d e C h a r参数传递0，那么该参数将不执行字符串的转换，而是返回为使转换取得成功所需要的缓存的值。

    可以通过下列步骤将多字节字符串转换成U n i c o d e等价字符串：
    1) 调用M u l t i B y t e To Wi d e C h a r函数，为p Wi d e C h a r S t r参数传递N U L L，为c c h Wi d e C h a r参数传递0。
    2) 分配足够的内存块，用于存放转换后的U n i c o d e字符串。该内存块的大小由前面对M u l t B y t e To Wi d e C h a r的调用返回。
    3) 再次调用M u l t i B y t e To Wi d e C h a r，这次将缓存的地址作为p Wi d e C h a r S t r参数来传递，并传递第一次调用M u l t i B y t e To Wi d e C h a r时返回的缓存大小，作为c c h Wi d e c h a r参数。
    4) 使用转换后的字符串。
    5) 释放U n i c o d e字符串占用的内存块。

   int WideCharToMultiByte(
         UINT CodePage,                       // code page
        DWORD dwFlags,                     // performance and mapping flags
         LPCWSTR lpWideCharStr,     // wide-character string
         int cchWideChar,                      // number of chars in string
         LPSTR lpMultiByteStr,              // buffer for new string
         int cbMultiByte,                          // size of buffer
         LPCSTR lpDefaultChar,           // default for unmappable chars
         LPBOOL lpUsedDefaultChar // set when default char used
    )

chinawash

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
我学习Windows核心编程之二 ANSI/Unicode字符和字符串

TChar.h是String.h的修改，用于创建ANSI/Unicode通用字符串。Unicode字符串的每个字符都是16位的。Win9x只支持ANSI；Win2000/XP/2003支持ANSI/Unicode；WinCE只支持Unicode 附：有部分Unicode函数也可以在Win9X中使用，但可能会出现意想不到错误。wchar_t是Unicode字符的数据类型。所
复制链接

扫一扫