今天按照原计划开始第二章的阅读和学习,本章相对于原来《Windows核心编程4th》有了较大的改写,具体在于新函数和特性的介绍,着重强调安全编码(Security-Coding)的重要性以及通用编码的本地化意义,同时从内部向我们揭示了使用Unicode的高效性和通用性,我个人感觉十分重要,因为几乎每个程序员都要与字符与字符串打交道,而由于历史的原因大量的遗弃函数或者说是饱受批评的函数(deprecated functions)我们今天仍然在大量使用,同时还“乐此不疲”例如lstrcpy、lstrcat等等。这些函数的不恰当使用或者说函数本身的安全缺陷大大动摇了我们代码的健壮性与稳定性,为代码埋下了不小的安全隐患,特别是作为Windows Programmer这些问题尤为严重。
OK,言归正传,开始记录这一章的要点和心得体会:)
第二章的开篇,就介绍了Unicode的伟大意义(e.g:解决了各个国家字符集的处理问题,便利的本地化过程,同时加速了全球化的软件发布,等等......),我觉得要注意就是对于Windows下的VisualStdio来说,wchar_t已经可以做为内置的变量类型来支持Unicode,当然还是要在编译开关中开启的 /Zc:wchar_t,其实wchar_t就是unsigned short:),还有就是我们不是特别注意的有些类型的定义例如在WinNT.h里面的typedef __nullterminated WCHAR *NWPSTR *LPWSTR *PWSTR,这里面有个__nullterminated的前缀,这个意义要在企业版的VisualStudio中才看的出来,就是在语义上可以分析出是否为NullTerminate的字符串,这个在编译的时候能够通过打开/analyze这个开关来检测出来(非企业版没有这个开关,PS:鄙视MS!)。
在Windows内的Unicode与ANSI这一部分里面,着重谈到了当前Windows的实现兼容的原理:就是老版本的函数(16位的),内部转换后调用新版函数(32位的)来实现功能,还有个别只支持16bit操作系统和ANSI的顽固函数例如:WinExec、OpenFile等,干脆就建议用户使用CreateProcessing与CreateFile来替换,甚至提到,有些函数就只有Unicode版本了,例如:ReadDirectorychangesW等。这一切说白了就一句话:忘了ANSI函数吧,投向Unicode,Unicode是高效通用的源泉。(PS:内部实现机理都告诉我们了,ANSI版的Windows函数不行了)。
接下来我们转向这一章的重点所在:安全!
Secure String Function in the C Run-Time Library.
主要介绍的就是新的字符操作的安全函数,其特征为在函数名尾端有"_s"的后缀(PS:_s: means: for secure),这些函数的特点就是对于内存操作,都做了严格的限定,对于内容的长度大小都要考虑在内,从而避免Buffer Overflow的潜在危险,里面值得注意的是有个宏可以用比较方便_countof用来计算字符数量【注意与sizeof区别开,sizeof是计算byte的数量】这样不用为拷多少而发愁了:),还有一点要注意的是对于这些安全函数来说有个返回值:errno_t这个返回值指示了函数执行的最终状态,当然想要利用这个返回值做错误处理还不是那么简单,这个也是实现机理造成的,在Debug版本中如果说安全函数执行失败的话,就会弹出一个断言对话框出来,而在Release版本中则会直接把应用程序终止掉,我们该如何利用呢?其实也不难,两步就可以搞定:
1.定义一个错误处理函数,与下面这个函数原型匹配就可以啦,具体的函数名可以自己定义:
void InvalidParameterHandler( PCTSTR expression, PCTSTR function, PCTSTR file, unsigned int line, uintptr_t/*pReserved*/)
2.利用_set_invalid_parameter_handler注册这个函数 .
要注意的是:如果在Release版本的话,错误处理函数里面的参数都会被设置为NULL,这个时候就需要程序员自己来定义错误处理的过程了而不是根据传入的值,还有就是为了断言对话框不要弹出可以调用_CtrSetReportMode( _CTR_ASSERT, 0 )。
然后,介绍了一些新的安全函数,这些主要是针对字符串操作的,例如:StringCchCat,StringCbCat等等,要注意的是函数名当中的Cch和Cb其意义分别为:Count of chararacters和Count of bytes,而且还介绍了更为强大BT的带有Ex的同名函数,对于我们常用的字符串的比较,Windows推荐使用ComopareString和CompareStringOrdinal,其中我觉得CompareStringOrdinal更为实用,两个函数的最大区别就是一个考虑到本地语言的问题一个不考虑,还有就是CompareString利用dwCmdFlags提供了更多的选项可以控制比较的规则,当然这些都是有代价的,那就是时间效率,使用CompareStringOrdinal要比CompareString的效率高【PS:还有值得注意的是这个函数的返回值,其宏的值分别为1、2、3减去2以后就可以与strcmp对应,感觉设计还是挺好的】。
接下来,Jeffer怕大家盲目或者顽固,又详细讲述了为什么要使用Unicode以及他所推荐使用的一些方法,我感觉还是不错的,Jeffer大叔在这里算是都讲透了,如果大家还是顽固下去的话,估计他也无语了。我这里就把他讲的原因以及推荐的做法列出来,当然保证原汁原味:
Why You Should Use Unicode
When developing an application, we highly recommend that you use Unicode characters and strings. Here are some of the reasons why:
-
Unicode makes it easy for you to localize your application to world markets.
-
Unicode allows you to distribute a single binary (.exe or DLL) file that supports all languages.
-
Unicode improves the efficiency of your application because the code performs faster and uses less memory. Windows internally does everything with Unicode characters and strings, so when you pass an ANSI character or string, Windows must allocate memory and convert the ANSI character or string to its Unicode equivalent.
-
Using Unicode ensures that your application can easily call all nondeprecated Windows functions, as some Windows functions offer versions that operate only on Unicode characters and strings.
-
Using Unicode ensures that your code easily integrates with COM (which requires the use of Unicode characters and strings).
-
Using Unicode ensures that your code easily integrates with the .NET Framework (which also requires the use of Unicode characters and strings).
-
Using Unicode ensures that your code easily manipulates your own resources (where strings are always persisted as Unicode).
How We Recommend Working with Characters and Strings
Based on what you've read in this chapter, the first part of this section summarizes what you should always keep in mind when developing your code. The second part of the section provides tips and tricks for better Unicode and ANSI string manipulations. It's a good idea to start converting your application to be Unicode-ready even if you don't plan to use Unicode right away. Here are the basic guidelines you should follow:
-
Start thinking of text strings as arrays of characters, not as arrays of chars or arrays of bytes.
-
Use generic data types (such as TCHAR/PTSTR) for text characters and strings.
-
Use explicit data types (such as BYTE and PBYTE) for bytes, byte pointers, and data buffers.
-
Use the TEXT or _T macro for literal characters and strings, but avoid mixing both for the sake of consistency and for better readability.
-
Perform global replaces. (For example, replace PSTR with PTSTR.)
-
Modify string arithmetic problems. For example, functions usually expect you to pass a buffer's size in characters, not bytes. This means you should pass _countof(szBuffer) instead of sizeof(szBuffer). Also, if you need to allocate a block of memory for a string and you have the number of characters in the string, remember that you allocate memory in bytes. This means that you must call malloc(nCharacters * sizeof(TCHAR)) and not call malloc(nCharacters). Of all the guidelines I've just listed, this is the most difficult one to remember, and the compiler offers no warnings or errors if you make a mistake. This is a good opportunity to define your own macros, such as the following:
-
Avoid printf family functions, especially by using %s and %S field types to convert ANSI to Unicode strings and vice versa. Use MultiByteToWideChar and WideCharToMultiByte instead, as shown in "Translating Strings Between Unicode and ANSI" below.
-
Always specify both UNICODE and _UNICODE symbols or neither of them.
In terms of string manipulation functions, here are the basic guidelines that you should follow:
-
Always work with safe string manipulation functions such as those suffixed with _s or prefixed with StringCch. Use the latter for explicit truncation handling, but prefer the former otherwise.
-
Don't use the unsafe C run-time library string manipulation functions. (See the previous recommendation.) In a more general way, don't use or implement any buffer manipulation routine that would not take the size of the destination buffer as a parameter. The C run-time library provides a replacement for buffer manipulation functions such as memcpy_s, memmove_s, wmemcpy_s, or wmemmove_s. All these methods are available when the __STDC_WANT_SECURE_LIB__ symbol is defined, which is the case by default in CrtDefs.h. So don't undefine __STDC_WANT_SECURE_LIB__.
-
Take advantage of the /GS (http://msdn2.microsoft.com/en-us/library/aa290051(VS.71).aspx) and /RTCs compiler flags to automatically detect buffer overruns.
-
Don't use Kernel32 methods for string manipulation such as lstrcat and lstrcpy.
-
There are two kinds of strings that we compare in our code. Programmatic strings are file names, paths, XML elements and attributes, and registry keys/values. For these, use CompareStringOrdinal, as it is very fast and does not take the user's locale into account. This is good because these strings remain the same no matter where your application is running in the world. User strings are typically strings that appear in the user interface. For these, call CompareString(Ex), as it takes the locale into account when comparing strings.
You don't have a choice: as a professional developer, you can't write code based on unsafe buffer manipulation functions. And this is the reason why all the code in this book relies on these safer functions from the C run-time library.(这句话最猛,言下之意,这些函数你不用就不是专业开发人员,小心Jeffrey鄙视你!!!)
最后,讲述了ANSI与Unicode之间的转换,以及ANSI与Unicode版的DLL函数的导出,还有就是怎样判断文本是Unicode还是ANSI(利用IsTextUnicode)。
自此,本章结束了,其中的内容还是挺重要的,核心还是在于我们平时的实践,如果说看完本章后依然我行我素顽固不化,Jeffrey就要BS你啦,BS你不专业o(∩_∩)o...哈哈