glibc strlen 实现

最新推荐文章于 2022-07-04 15:24:30 发布

SkyGray

最新推荐文章于 2022-07-04 15:24:30 发布

阅读量1.1k

点赞数

文章标签： character 编译器 testing comments each java

glibc中的strlen的实现主要的思想就是每次检测4个字节(long int)。这样的话就降低了循环的次数，从而从整体上提高了效率。

　　这里它使用了两个技巧，一个是由于传进来的字符串的地址有可能不是4字节(long int)对其的，因此首先需要遍历字符串从而找到4字节对其的那个地址。然后再进行比较.

　　第二个技巧就是如何高效的判断4个字节中是否有字节为0.

　　下来来看源码，这个源码的注释还是满详细的。这里主要都是一些位计算的技巧：

　　Java代码

size_t　 strlen　(str)　　　　const　char　*str;　 {　　const　char　*char_ptr;　　const　unsigned　long　int　*longword_ptr;　　unsigned　long　int　longword,　himagic,　lomagic;　　　/*　Handle　the　first　few　characters　by　reading　one　character　at　a　time.　　　　Do　this　until　CHAR_PTR　is　aligned　on　a　longword　boundary.　*/　　　for　(char_ptr　=　str;　((unsigned　long　int)　char_ptr　　　&　(sizeof　(longword)　-　1))　!=　0;　　　　　++char_ptr)　　　if　(*char_ptr　==　'/0')　　　　return　char_ptr　-　str;　　　/*　All　these　elucidatory　comments　refer　to　4-byte　longwords,　　　　but　the　theory　applies　equally　well　to　8-byte　longwords.　*/　　　longword_ptr　=　(unsigned　long　int　*)　char_ptr;　　　/*　Bits　31,　24,　16,　and　8　of　this　number　are　zero.　Call　these　bits　　　　the　"holes."　Note　that　there　is　a　hole　just　to　the　left　of　　　　each　byte,　with　an　extra　at　the　end:　　　　　bits:　01111110　11111110　11111110　11111111　　　　bytes:　AAAAAAAA　BBBBBBBB　CCCCCCCC　DDDDDDDD　　　　　The　1-bits　make　sure　that　carries　propagate　to　the　next　0-bit.　　　　The　0-bits　provide　holes　for　carries　to　fall　into.　*/　　himagic　=　0x80808080L;　　lomagic　=　0x01010101L;　　if　(sizeof　(longword)　>　4)　　　{　　　　/*　64-bit　version　of　the　magic.　*/　　　　/*　Do　the　shift　in　two　steps　to　avoid　a　warning　if　long　has　32　bits.　*/　　　　himagic　=　((himagic　<<　16)　<<　16)　|　himagic;　　　　lomagic　=　((lomagic　<<　16)　<<　16)　|　lomagic;　　　}　　if　(sizeof　(longword)　>　8)　　　abort　();　　　/*　Instead　of　the　traditional　loop　which　tests　each　character,　　　　we　will　test　a　longword　at　a　time.　The　tricky　part　is　testing　　　　if　*any　of　the　four*　bytes　in　the　longword　in　question　are　zero.　*/　　for　(;;)　　　{　　　　longword　=　*longword_ptr++;　　　　　if　(((longword　-　lomagic)　&　~longword　&　himagic)　!=　0)　　{　　　/*　Which　of　the　bytes　was　the　zero?　If　none　of　them　were,　it　was　　　　a　misfire;　continue　the　search.　*/　　　　const　char　*cp　=　(const　char　*)　(longword_ptr　-　1);　　　　if　(cp[0]　==　0)　　　　return　cp　-　str;　　　if　(cp[1]　==　0)　　　　return　cp　-　str　+　1;　　　if　(cp[2]　==　0)　　　　return　cp　-　str　+　2;　　　if　(cp[3]　==　0)　　　　return　cp　-　str　+　3;　　　if　(sizeof　(longword)　>　4)　　　　{　　　　　if　(cp[4]　==　0)　　return　cp　-　str　+　4;　　　　　if　(cp[5]　==　0)　　return　cp　-　str　+　5;　　　　　if　(cp[6]　==　0)　　return　cp　-　str　+　6;　　　　　if　(cp[7]　==　0)　　return　cp　-　str　+　7;　　　　}　　}　　　}　 }

　代码比较长,不过注释也比较详细,这里主要分析下两个位运算:

　for　(char_ptr　=　str;　((unsigned　long　int)　char_ptr　　　&　(sizeof　(longword)　-　1))　!=　0;　　　　　++char_ptr)　　　if　(*char_ptr　==　'/0')　　　　return　char_ptr　-　str;

　　这里是做一个循环来判断4-byte对齐的那个地址,这里一般的编译器传递进来的字符串指针都是4-byte对其的,因此这步在很多编译器都是跳过的.

　　下来来看如何在4字节的数字中判断是否有某个字节为0.

　　Java代码

　himagic　=　0x80808080L;　　lomagic　=　0x01010101L;　 if　(((longword　-　lomagic)　&　~longword　&　himagic)　!=　0)　

　　这里分两个部分看.

　　第一部分(longword - lomagic) 这个表达式是用来判断最高位的,它的意思是: 如果longword的任何一个字节如果大于0x80或者等于0的情况下,它的这个字节的最高位的值都会是不等于0的.

　　第二部分(~longword & himagic)这个表达式是用来判断longword的每个字节的最高位的设置,也就是判断longword是否小于0x80,如果小于0x80则这个表达式的值是himagic(0x80808080).否则就是0.

　　因此这两部分合并起来刚好就是如果longword的某个字节小于0x80,并且某个字节为0,则整个表达式的值不为0.

　　而我们知道传进来的字符串的每个字符的大小的ascii码的范围就是[0,127],也就是肯定是小于0x80的.所以这个表达式就刚好是我们所要的.

SkyGray

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
glibc strlen 实现

 glibc中的strlen的实现主要的思想就是每次检测4个字节(long int)。这样的话就降低了循环的次数，从而从整体上提高了效率。 　　这里它使用了两个技巧，一个是由于传进来的字符串的地址有可能不是4字节(long int)对其的，因此首先需要遍历字符串从而找到4字节对其的那个地址。然后再进行比较. 　　第二个技巧就是如何高效的判断4个字节中是否有字节为0. 　　下来来看源码，这个源码的注释还是满详细的。这里主要都是一些位计算的技巧： 　　J
复制链接

扫一扫