Why You Should Use Unicode 为何要使用Unicode

最新推荐文章于 2023-09-11 13:41:11 发布

xuwy48563526

最新推荐文章于 2023-09-11 13:41:11 发布

阅读量795

点赞数

分类专栏：英语文章标签： buffer application string windows arrays compiler

英语专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Why You Should Use Unicode 为何要使用Unicode

When developing an application, we highly recommend that you use Unicode characters and strings. Here are

some of the reasons why:

开发一个应用程序时，极力推荐使用Unicode字符和字符串。以下是一些理由：

1. Unicode makes it easy for you to localize your application to world markets.

Unicode使你的应用程序本地化更容易。

2. Unicode allows you to distribute a single binary (.exe or DLL) file that supports all languages.

Unicode使你只发布一个二进制文件（.exe或DLL）就能支持所有语言。

3. Unicode improves the efficiency of your application because the code performs faster and uses less

memory. Windows internally does everything with Unicode characters and strings, so when you pass an ANSI

character or string, Windows must allocate memory and convert the ANSI character or string to its Unicode

equivalent.

Unicode会提高应用程序的效率，因为代码执行更快且占用内存更小。Windows内部用Unicode字符及字符串进行所有操作

，因此当传递ANSI字符或字符串时，Windows必须分配内存并将这些ANSI字符或字符串转换成等价的Unicode字符。

4. Using Unicode ensures that your application can easily call all nondeprecated Windows functions, as

some Windows functions offer versions that operate only on Unicode characters and strings.

使用Unicode确保应用程序能容易地调用所有非不赞成使用(nondeprecated)的Windows函数，因为一些Windows函数只提供

对Unicode字符和字符串的操作。

5. Using Unicode ensures that your code easily integrates with COM (which requires the use of Unicode

characters and strings).

使用Unicode确保代码易于与COM集成（要求使用Unicode字符和字符串）。

6. Using Unicode ensures that your code easily integrates with the .NET Framework (which also requires the

use of Unicode characters and strings).

使用Unicode确保代码易于与.NET Framework集成（也要求使用Unicode字符和字符串）。

7. Using Unicode ensures that your code easily manipulates your own resources (where strings are always

persisted as Unicode).

使用Unicode确保代码易于操作自身资源（所含字符串总是Unicode的）。

How We Recommend Working with Characters and Strings 进行字符和字符串处理的建议

Based on what you've read in this chapter, the first part of this section summarizes what you should

always keep in mind when developing your code. The second part of the section provides tips and tricks for

better Unicode and ANSI string manipulations. It's a good idea to start converting your application to be

Unicode-ready even if you don't plan to use Unicode right away. Here are the basic guidelines you should

follow:

基于本章所读到的，第一部分概述了在开发程序时头脑中应该始终保持什么（观念）。第二部分提供更好地进行Unicode

和ANSI字符串操作的技巧和窍门。即使你并不打算马上使用Unicode字符，将应用程序转为Unicode现成可用，仍是很好的

做法。以下是你需要遵守的基本准则：

1. Start thinking of text strings as arrays of characters, not as arrays of chars or arrays of bytes.

开始将字符串看作是字符的数组，而不是char或byte的数组。

2. Use generic data types (such as TCHAR/PTSTR) for text characters and strings.

对文本字符和字符串使用范型数据类型（如TCHAR/PTSTR）。

3. Use explicit data types (such as BYTE and PBYTE) for bytes, byte pointers, and data buffers.

对字节、字节指针和数据缓冲区使用显式数据类型（如BYTE和PBYTE）。

4. Use the TEXT or _T macro for literal characters and strings, but avoid mixing both for the sake of

consistency and for better readability.

对文字列字符和字符串使用宏TEXT或_T，但避免混用，以保持一致性和更好的可读性。

5. Perform global replaces. (For example, replace PSTR with PTSTR.)

执行全局替换。（例如，用PTSTR替换PSTR）

6. Modify string arithmetic problems. For example, functions usually expect you to pass a buffer's size in

characters, not bytes. This means you should pass _countof(szBuffer) instead of sizeof(szBuffer). Also, if

you need to allocate a block of memory for a string and you have the number of characters in the string,

remember that you allocate memory in bytes. This means that you must call malloc(nCharacters * sizeof

(TCHAR)) and not call malloc(nCharacters). Of all the guidelines I've just listed, this is the most

difficult one to remember, and the compiler offers no warnings or errors if you make a mistake. This is a

good opportunity to define your own macros, such as the following:

修正字符串算术问题。例如，函数通常需要传入buffer的字符数，而不是字节数。这意味着你要用_countof(szBuffer)代

替sizeof(szBuffer)传入。也要求为字符串分配内存块的时候知道字符串中的字符数，请记得分配内存是以byte进行的。

这意味着必须使用malloc(nCharacters * sizeof(TCHAR))，而不能使用malloc(nCharacters)。在所有列出的准则中，这

一条是最难记住的，并且当你犯错时编译器不进行警告和错误提示。这是定义你自己得宏的好机会，如下所示：

#define chmalloc(nCharacters) (TCHAR*)malloc(nCharacters * sizeof(TCHAR)).

7. Avoid printf family functions, especially by using %s and %S field types to convert ANSI to Unicode strings and vice versa. Use MultiByteToWideChar and WideCharToMultiByte instead, as shown in "Translating Strings Between Unicode and ANSI" below.

避免使用printf系函数，尤其是使用%s和%S字段将ANSI转化为Unicode字符串（反之亦然）的情况。取而代之，使用

MultiByteToWideChar和WideCharToMultiByte函数，见后面的“转换Unicode和ANSI字符串”。

8. Always specify both UNICODE and _UNICODE symbols or neither of them.

总是同时指定UNICODE和_UNICODE符号，或同时不指定。

In terms of string manipulation functions, here are the basic guidelines that you should follow:

对于字符串操作函数，这里是需要遵循的基本准则：

1. Always work with safe string manipulation functions such as those suffixed with _s or prefixed with

StringCch. Use the latter for explicit truncation handling, but prefer the former otherwise.

始终使用安全的字符串操作函数，比如那些带_s后缀的、或含有StringCch前缀的函数。Use the latter for explicit

truncation handling, but prefer the former otherwise.(???????)

2. Don't use the unsafe C run-time library string manipulation functions. (See the previous

recommendation.) In a more general way, don't use or implement any buffer manipulation routine that would

not take the size of the destination buffer as a parameter. The C run-time library provides a replacement

for buffer manipulation functions such as memcpy_s, memmove_s, wmemcpy_s, or wmemmove_s. All these methods

are available when the __STDC_WANT_SECURE_LIB__ symbol is defined, which is the case by default in

CrtDefs.h. So don't undefine __STDC_WANT_SECURE_LIB__.

不要使用不安全的C运行期库的字符串操作函数。（查看前面的建议。）更一般的做法，不要使用或实现任何含有以目标

buffer作为参数的buffer操作例程。C运行期库提供buffer操作函数的替代（版本），如memcpy_s, memmove_s,

wmemcpy_sh或wmemmove_s。当__STDC_WANT_SECURE_LIB__被定义时，所有这些函数是有效的，这在CrtDefs.h中是默认情

况。所以不要对__STDC_WANT_SECURE_LIB__进行undefine。

3. Take advantage of the /GS (http://msdn2.microsoft.com/en-us/library/aa290051(VS.71).aspx) and /RTCs

compiler flags to automatically detect buffer overruns.

利用/GS(http://msdn2.microsoft.com/en-us/library/aa290051(VS.71).aspx)和/RTCs编译标记来自动检测缓冲区溢出

。

4. Don't use Kernel32 methods for string manipulation such as lstrcat and lstrcpy.

不要使用Kernel32方法，如lstrcat和lstrcpy，进行字符串操作。

5. There are two kinds of strings that we compare in our code. Programmatic strings are file names, paths,

XML elements and attributes, and registry keys/values. For these, use CompareStringOrdinal, as it is very

fast and does not take the user's locale into account. This is good because these strings remain the same

no matter where your application is running in the world. User strings are typically strings that appear

in the user interface. For these, call CompareString(Ex), as it takes the locale into account when

comparing strings.

在我们的代码中有两种字符串类型。提纲性字符串，是文件名，XML元素和属性，及注册表键/值。对它们使用

CompareStringOrdinal，执行速度飞快，且不用考虑用户场所。这是很好的一点，因为不论你的应用程序运行在世界的任

何地方，这些字符串都保持一致。用户字符串，典型地出现于用户接口。对它们使用CompareString(Ex)，在比较时要考

虑场所。

You don't have a choice: as a professional developer, you can't write code based on unsafe buffer

manipulation functions. And this is the reason why all the code in this book relies on these safer

functions from the C run-time library.

你别无选择：作为专业开发者，不能编写基于非安全buffer操作函数的代码。这就是本书中所有代码，都采用比C运行起

库函数更安全的函数的原因。

xuwy48563526

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录