产品有个可以用户自定义输入的ID,原来输入范围是8个阿拉伯数组,设置时会做合规性检查。最近因为ID取值空间太小,要ID定义修改为8个字符组成字符串,字符可以为数字或字母,即{0-9,A-Z,a-z}。功能很简单,调用ctype.h中定义的isalnum()就可以了,网上搜索来的定义一般如代码清单1:
代码清单1:
int isalnum (int __c);
且慢,打开ctype.h头文件Crtl+F一下,又发现一个宏定义形式的isalnum(),如下:
代码清单2:
#define isalnum(__c) (__ctype_lookup(__c)&(_U|_L|_N))
在ctype.h头文件再仔细看,看到一些列的宏定义
代码清单3:
#define _U 01
#define _L 02
#define _N 04
#define _S 010
#define _P 020
#define _C 040
#define _X 0100
#define _B 0200
#ifdef __HAVE_LOCALE_INFO__
const char *__locale_ctype_ptr (void);
#else
#define __locale_ctype_ptr() _ctype_
#endif
# define __CTYPE_PTR (__locale_ctype_ptr ())
#ifndef __cplusplus
/* These macros are intentionally written in a manner that will trigger
a gcc -Wall warning if the user mistakenly passes a 'char' instead
of an int containing an 'unsigned char'. Note that the sizeof will
always be 1, which is what we want for mapping EOF to __CTYPE_PTR[0];
the use of a raw index inside the sizeof triggers the gcc warning if
__c was of type char, and sizeof masks side effects of the extra __c.
Meanwhile, the real index to __CTYPE_PTR+1 must be cast to int,
since isalpha(0x100000001LL) must equal isalpha(1), rather than being
an out-of-bounds reference on a 64-bit machine. */
#define __ctype_lookup(__c) ((__CTYPE_PTR+sizeof(""[__c]))[(int)(__c)])
#define isalpha(__c) (__ctype_lookup(__c)&(_U|_L))
#define isupper(__c) ((__ctype_lookup(__c)&(_U|_L))==_U)
#define islower(__c) ((__ctype_lookup(__c)&(_U|_L))==_L)
#define isdigit(__c) (__ctype_lookup(__c)&_N)
#define isxdigit(__c) (__ctype_lookup(__c)&(_X|_N))
#define isspace(__c) (__ctype_lookup(__c)&_S)
#define ispunct(__c) (__ctype_lookup(__c)&_P)
#define isalnum(__c) (__ctype_lookup(__c)&(_U|_L|_N))
#define isprint(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N|_B))
#define isgraph(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N))
#define iscntrl(__c) (__ctype_lookup(__c)&_C)
顺着宏定义一路找,找到最后发现_ctype_这个宏定义没有展开,那么这个_ctype_优势何方神圣呢?
在内网搜了一圈没有结果,最后架上梯子,在纽约州立大学布法罗分校网站的挂着一个小型操作系统代码存档文件里找到一个_ctype_的原型,它是一个char类型的数组,定义如代码清单4:
代码清单4:
00007
00008
00009 #include <ctype.h>
00010
00011 char _ctype_[] = {
00012 0,
00013 _C, _C, _C, _C, _C, _C, _C, _C,
00014 _C, _S, _S, _S, _S, _S, _C, _C,
00015 _C, _C, _C, _C, _C, _C, _C, _C,
00016 _C, _C, _C, _C, _C, _C, _C, _C,
00017 _S, _P, _P, _P, _P, _P, _P, _P,
00018 _P, _P, _P, _P, _P, _P, _P, _P,
00019 #ifdef linux
00020 _D, _D, _D, _D, _D, _D, _D, _D,
00021 _D, _D, _P, _P, _P, _P, _P, _P,
00022 #else
00023 _N, _N, _N, _N, _N, _N, _N, _N,
00024 _N, _N, _P, _P, _P, _P, _P, _P,
00025 #endif
00026 _P, _U|_X, _U|_X, _U|_X, _U|_X, _U|_X, _U|_X, _U,
00027 _U, _U, _U, _U, _U, _U, _U, _U,
00028 _U, _U, _U, _U, _U, _U, _U, _U,
00029 _U, _U, _U, _P, _P, _P, _P, _P,
00030 _P, _L|_X, _L|_X, _L|_X, _L|_X, _L|_X, _L|_X, _L,
00031 _L, _L, _L, _L, _L, _L, _L, _L,
00032 _L, _L, _L, _L, _L, _L, _L, _L,
00033 _L, _L, _L, _P, _P, _P, _P, _C
00034 };
至此才看出isalpha(__c)这个宏的完整的工作原理:首先定义了一个129字节的数组,数组第一个值为0,随后是ASCII 字符集的0-127对应的字符类型(数字、大写字母、小写字母、十六进制数...),具体参见代码清单5定义。
代码清单5:
#define _U 01 // 大写字母字符
#define _L 02 // 大写字母字符
#define _N 04 // 数字字符
#define _S 010 // 空白字符
#define _P 020 // 标点字符
#define _C 040 // 控制字符
#define _X 0100 // 十六进制数字字符
#define _B 0200 //
这个预定义的数组为快速检测字符类型提供了方便,例如 isalpha()
、isdigit()
、isspace()
等,这些函数用于检查字符的属性,可以方便地进行字符分类。
具体到isalpha(__c)的展开如下:
isalnum(__c)展开得到(__ctype_lookup(__c)&(_U|_L|_N));而__ctype_lookup(__c)展开为 ((__CTYPE_PTR+sizeof(""[__c]))[(int)(__c)]);__CTYPE_PTR最终展开为_ctype_。
按照代码中的注释说明sizeof(""[__c])不论__c是多少位的字符集,sizeof(""[__c])返回1.并且确保查找的时候数组下标不会越界。
代码清单6:
/* These macros are intentionally written in a manner that will trigger
a gcc -Wall warning if the user mistakenly passes a 'char' instead
of an int containing an 'unsigned char'. Note that the sizeof will
always be 1, which is what we want for mapping EOF to __CTYPE_PTR[0];
the use of a raw index inside the sizeof triggers the gcc warning if
__c was of type char, and sizeof masks side effects of the extra __c.
Meanwhile, the real index to __CTYPE_PTR+1 must be cast to int,
since isalpha(0x100000001LL) must equal isalpha(1), rather than being
an out-of-bounds reference on a 64-bit machine. */
#define __ctype_lookup(__c) ((__CTYPE_PTR+sizeof(""[__c]))[(int)(__c)])
最终__ctype_lookup(__c)就简化为_ctype_[__c+1];
isalnum(__c)最终展开为_ctype_[__c+1]&(_U|_L|_N),若非数字或字母,返回0,若为数字或字母,返回_U、_L、_N三者之一。