一个字节能表示0~255之间共256个数字,根据ASCII码英文字母A-Z和a-z按顺序排列,其中
- 'A' = 65 = 0b01000001 = 0x41
- 'B' = 66 = 0b01000010 = 0x42
- ...
- 'Z' = 90 = 0b01011010 = 0x5a
- 'a' = 97 = 0b01100001 = 0x61
- 'b' = 98 = 0b01100010 = 0x62
- ...
- 'z' = 122 = 0b01111010 = 0x7a
传统的判断方法是直接判断范围:
- #define judgeletter_classic(ch) (((ch)>='A'&&ch<='Z')||((ch)>='a'&&(ch)<='z'))
但是仔细观察二进制部分会发现以下特点:
(1)所有字母最高两位一定是01
(2)从高位数第三位为0时为大写字母,1时为小写字母
(3)低5位从00001到11010共26种情况分别代表A-Z和a-z
所以得到以下通过分析位来判断的方法:
- #define judgeletter_bit(ch) (((ch)>>6)==1)&&((((ch)-1)&31)<26)
还有一种方法叫查表法,首先构建一个表,把是字母的都标记为1,其他标记为0,这样就可以通过直接访问表中对应位置的数据得到判断:
- #define judgeletter_table(ch) (*(table + (ch)))
建立表:
- unsigned char table[256];
- memset(table, 0, 256);
- for (i = 'A'; i <= 'Z'; i++)
- {
- *(table + i) = 1;
- *(table + i + 'a' - 'A') = 1;
- }
最后C标准库内也自带了isalpha宏,可以判断是否为字母,在ctype.h里有声明:
- # define isalpha(c) __isctype((c), _ISalpha)
现在我们来测试一下三种方法的速度,我们分别用三种方法循环判断0-255之间所有数字是否为ASCII码的英文字母,每种方法10000000次,然后输出所用时间,程序如下:
- #include <stdio.h>
- #include <memory.h>
- #include <ctype.h>
- #include <sys/time.h>
- #define TEST_TIMES 10000000
- #define DEFINE_TIME /
- struct timeval time1, time2;/
- #define START_RECORDING /
- gettimeofday(&time1, NULL);/
- #define STOP_RECORDING /
- gettimeofday(&time2, NULL);/
- #define PRINT_TIME /
- printf("%lu:%lu/n", time2.tv_sec - time1.tv_sec, time2.tv_usec - time1.tv_usec);
- #define judgeletter_bit(ch) (((ch)>>6)==1)&&((((ch)-1)&31)<26)
- #define judgeletter_classic(ch) (((ch)>='A'&&ch<='Z')||((ch)>='a'&&(ch)<='z'))
- #define judgeletter_table(ch) (*(table + (ch)))
- int main(int argc, const char *argv[])
- {
- unsigned int ch;
- int i;
- int result;
- unsigned char table[256];
- memset(table, 0, 256);
- for (i = 'A'; i <= 'Z'; i++)
- {
- *(table + i) = 1;
- *(table + i + 'a' - 'A') = 1;
- }
- DEFINE_TIME;
- //Classic
- printf("classic:");
- START_RECORDING;
- for (i = 0; i < TEST_TIMES; i++) {
- for (ch = 0; ch < 256; ch++)
- {
- result = judgeletter_classic(ch);
- }
- }
- STOP_RECORDING;
- PRINT_TIME;
- //Bit
- printf("bit:");
- START_RECORDING;
- for (i = 0; i < TEST_TIMES; i++) {
- for (ch = 0; ch < 256; ch++)
- {
- result = judgeletter_bit(ch);
- }
- }
- STOP_RECORDING;
- PRINT_TIME;
- //Table
- printf("table:");
- START_RECORDING;
- for (i = 0; i < TEST_TIMES; i++) {
- for (ch = 0; ch < 256; ch++)
- {
- result = judgeletter_table(ch);
- }
- }
- STOP_RECORDING;
- PRINT_TIME;
- //ctype
- printf("isalpha:");
- START_RECORDING;
- for (i = 0; i < TEST_TIMES; i++) {
- for (ch = 0; ch < 256; ch++)
- {
- result = isalpha(ch);
- }
- }
- STOP_RECORDING;
- PRINT_TIME;
- return 0;
- }
我的机器使用gcc 4.4.5,无优化选项编译,运行得到的结果为:
- classic:15:701542
- bit:11:172520
- table:16:4294363296
- isalpha:43:442001
由此可见:采用bit 运算的效率最高,最节省时间