strtok()函数深入分析

最新推荐文章于 2024-08-13 16:05:21 发布

xx77009833

最新推荐文章于 2024-08-13 16:05:21 发布

阅读量255

点赞数

分类专栏： C/C++ 文章标签： string token null tokenize function character

C/C++ 专栏收录该内容

32 篇文章 0 订阅

订阅专栏

strtok()这个函数大家都应该碰到过,但好像总有些问题, 这里着重讲下它

首先看下MSDN上的解释:

char *strtok( char *strToken, const char *strDelimit );

Parameters
strToken
String containing token or tokens.
strDelimit
Set of delimiter characters.
Return Value
Returns a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered.

Remarks
The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call.

Security Note These functions incur a potential threat brought about by a buffer overrun problem. Buffer overrun problems are a frequent method of system attack, resulting in an unwarranted elevation of privilege. For more information, see Avoiding Buffer Overruns.

On the first call to strtok, the function skips leading delimiters and returns a pointer to the first token in strToken, terminating the token with a null character. More tokens can be broken out of the remainder of strToken by a series of calls to strtok. Each call to strtok modifies strToken by inserting a null character after the token returned by that call. To read the next token from strToken, call strtok with a NULL value for the strToken argument. The NULL strToken argument causes strtok to search for the next token in the modified strToken. The strDelimit argument can take any value from one call to the next so that the set of delimiters may vary.

Note Each function uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.
很晕吧？呵呵。。。

简单的说，就是函数返回第一个分隔符分隔的子串后，将第一参数设置为NULL，函数将返回剩下的子串。

下面我们来看一个例子：

int main() {

char test1[] = "feng,ke,wei";

char *test2 = "feng,ke,wei";

char *p; p = strtok(test1, ",");

while(p)

{

printf("%s/n", p);

p = strtok(NULL, ",");

}

return 0;

}

运行结果:

feng

wei

但如果用p = strtok(test2, ",")则会出现内存错误,这是为什么呢?是不是跟它里面那个静态变量有关呢? 我们来看看它的原码:

/***
*strtok.c - tokenize a string with given delimiters
*
*         Copyright (c) Microsoft Corporation. All rights reserved.
*
*Purpose:
*         defines strtok() - breaks string into series of token
*         via repeated calls.
*
*******************************************************************************/#include <cruntime.h>
#include <string.h>
#ifdef _MT
#include <mtdll.h>
#endif /* _MT *//***
*char *strtok(string, control) - tokenize string with delimiter in control
*
*Purpose:
*         strtok considers the string to consist of a sequence of zero or more
*         text tokens separated by spans of one or more control chars. the first
*         call, with string specified, returns a pointer to the first char of the
*         first token, and will write a null char into string immediately
*         following the returned token. subsequent calls with zero for the first
*         argument (string) will work thru the string until no tokens remain. the
*         control string may be different from call to call. when no tokens remain
*         in string a NULL pointer is returned. remember the control chars with a
*         bit map, one bit per ascii char. the null char is always a control char.
*       //这里已经说得很详细了!!比MSDN都好!
*Entry:
*         char *string - string to tokenize, or NULL to get next token
*         char *control - string of characters to use as delimiters
*
*Exit:
*         returns pointer to first token in string, or if string
*         was NULL, to next token
*         returns NULL when no more tokens remain.
*
*Uses:
*
*Exceptions:
*
*******************************************************************************/char * __cdecl strtok (
          char * string,
          const char * control
          )
{
          unsigned char *str;
          const unsigned char *ctrl = control;          unsigned char map[32];
          int count;#ifdef _MT
          _ptiddata ptd = _getptd();
#else /* _MT */
          static char *nextoken;                          //保存剩余子串的静态变量
#endif /* _MT */          /* Clear control map */
          for (count = 0; count < 32; count++)
                  map[count] = 0;          /* Set bits in delimiter table */
          do {
                  map[*ctrl >> 3] |= (1 << (*ctrl & 7));
          } while (*ctrl++);          /* Initialize str. If string is NULL, set str to the saved
           * pointer (i.e., continue breaking tokens out of the string
           * from the last strtok call) */
          if (string)
                  str = string;                               //第一次调用函数所用到的原串          else
#ifdef _MT
                  str = ptd->_token;
#else /* _MT */
                str = nextoken;                        //将函数第一参数设置为NULL时调用的余串#endif /* _MT */          /* Find beginning of token (skip over leading delimiters). Note that
           * there is no token iff this loop sets str to point to the terminal
           * null (*str == '/0') */
          while ( (map[*str >> 3] & (1 << (*str & 7))) && *str )
                  str++;        string = str;                                    //此时的string返回余串的执行结果           /* Find the end of the token. If it is not the end of the string,
           * put a null there. *///这里就是处理的核心了, 找到分隔符,并将其设置为'/0',当然'/0'也将保存在返回的串中
          for ( ; *str ; str++ )
                  if ( map[*str >> 3] & (1 << (*str & 7)) ) {
                        *str++ = '/0';                //这里就相当于修改了串的内容 ①
                          break;
                  }          /* Update nextoken (or the corresponding field in the per-thread data
           * structure */
#ifdef _MT
          ptd->_token = str;
#else /* _MT */
        nextoken = str;                   //将余串保存在静态变量中,以便下次调用
#endif /* _MT */          /* Determine if a token has been found. */
          if ( string == str )
                return NULL;
          else
                  return string;
}
原来, 该函数修改了原串.

所以,当使用char *test2 = "feng,ke,wei"作为第一个参数传入时,在位置①处, 由于test2指向的内容保存在文字常量区,该区的内容是不能修改的,所以会出现内存错误. 而char test1[] = "feng,ke,wei" 中的test1指向的内容是保存在栈区的,所以可以修改.

看到这里大家应该会对文字常量区有个更加理性的认识吧.....

文章出处：http://hi.baidu.com/summy00/blog/item/d1be73a8766226b4ca130cc5.html

在hy0kl兄的php小屋看到了关于strtok函数的用法，帖子见http://hi.baidu.com/hy0kl/blog/item/2e7a3224a0303228d40742fc.html

在感到函数功能很棒的同时既有疑问又有不爽的地方。不清楚为什么第二次调用时参数必须是NULL，但是如果第二次调用时参数时依然使用str的话，结果只有第一个字串。原因就是调用一次之后，原串是被替换了成别的了。

测试了一下结果，发现调用了一次result = strtok( str, delims );之后，str字符串就变的不是原来的了，变得和result一样的，只有第一个子串"now"

str的原来的长字符串被第一个子串覆盖了，破坏了原来的串，这样感觉很是不好。

就拿那个例子来说吧

---------------------

char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "#";
char *result = NULL;

printf( "befor delim str is /"%s/"/n", str );
result = strtok( str, delims );
while( result != NULL ) {
printf( "result is /"%s/"/n", result );
result = strtok( NULL, delims );
}

printf( "after delim str is /"%s/"/n", str );

---------------------

如果在循环之后，加上printf( "after delim str is /"%s/"/n", str );看下后来的str

输出str时会发现输出的只有"now"了。没法还原为原来的那一长长串了..

这点感觉很不好，感觉微软做这个函数时，为什么最后不把str恢复为过去的状态串？？不解。。。

附：输出结果

------------------

befor delim str is "now # is the time for all # good men to come to the # aid of
their country"
result is "now "
result is " is the time for all "
result is " good men to come to the "
result is " aid of their country"
after delim str is "now "

---------------

文章出处：http://hi.baidu.com/flyskymlf/blog/item/1ca249cba1d0571cbf09e667.html

本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/ixidof/archive/2010/01/07/5146813.aspx