strtok与strtok_r函数及线程安全问题

kitesxian

已于 2024-09-10 21:36:35 修改

阅读量388

点赞数 5

分类专栏： Linux 文章标签： c语言 linux

于 2024-09-10 18:42:30 首次发布

本文链接：https://blog.csdn.net/kitesxian/article/details/142093255

版权

Linux 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

#include <string.h>

char *strtok(char *str, const char *delim);

char *strtok_r(char *str, const char *delim, char **saveptr);

总的：这两个函数都是分割字符串的函数，但是前者是线程不安全的，后者是线程安全的。

我们先从使用和学习的角度看一个例题：从键盘读入一个字符串，输出其中单词的个数。

第一次写代码如下：

#include<iostream>
#include<cstring>
int main() {
	char ch;
	for (;;) {
		int numOfWord = 0;
		while ((ch = getchar()) != '\n') {
			if (ch == ' ') {
				numOfWord++;
			}
		}
		std::cout << numOfWord + 1 << std::endl;
	}//这段代码有bug，因为如果输入有多个连续空格的话，numOfWord仍然会自增，导致计数不准;
	return 0;
}

测试结果如下：

正如代码中我注释的那样，如果有多个空格，那么代码就无法正确统计单词个数了。

但这难不倒我们：

#include<iostream>
#include<cstring>
#include<cstdio>
using namespace std;
#define N 128
int main() {
    char buff[N];
    while (cin.getline(buff, N)) {//从键盘读入字符串，但是'\n'它不会读入。我们打算使用'\0'作为判断字符串结束的标志;
        int pos = 0, numOfWord = 0;
        bool inWord = false;
        while (buff[pos] != '\0') { // 使用 '\0' 判断字符串结束
            if (buff[pos] != ' ') {
                if (!inWord) {
                    numOfWord++;  // 开始一个新单词
                    inWord = true;
                }
            }
            else {
                inWord = false;  // 结束当前单词
            }
            pos++;
        }
        cout << numOfWord << endl;
        memset(buff, 0, sizeof(buff));
    }
}

结果如下：

这下空格可以被正确地忽略了！但是这段代码显然不太容易一次性写出，有没有更加容易的办法？

当然是：strtok

#define _CRT_SECURE_NO_WARNINGS -1

#include<cstring>
#include<stdio.h>
#include<iostream>

int main() {
	const char s[2] = " ";
	char buffer[128];
	//std::cin >> buffer;
	gets_s(buffer);
	int numOfWord = 0;
	char* p = strtok(buffer, s);
	while (p != NULL) {
		printf("第%d个单词是%s\n", numOfWord + 1, p);
		numOfWord++;
		p = strtok(NULL, s);
	}
	std::cout << "总的单词个数为:" << numOfWord << std::endl;
	return 0;
}

ps：不要使用cin读取，它会跳过空格；

这就是strtok的一个用法；但是在学习LINUX系统编程的时候，我发现它是线程不安全的，多个线程在执行相同的代码，但结果是不同的；

举例：

#include<iostream>
#include<unistd.h>
#include<cstdio>

#include<semaphore.h>
#include<pthread.h>
#include<cstdlib>
#include<cstring>

using namespace std;


void *fun(void *arg)
{
  char buff[128] = {"a b c d e f"};
  char *s = strtok(buff, " ");
  while (s != NULL)
  {
    printf("fun s=%s\n", s);
    sleep(1);
    s = strtok(NULL, " ");
  }
}
int main()
{
  pthread_t id;
  pthread_create(&id, NULL, fun, NULL);

  char buff[128] = {"1 2 3 4 5 6"};
  char *s = strtok(buff, " ");
  while (s != NULL)
  {
    printf("main s=%s\n", s);
    sleep(1);
    s = strtok(NULL, " ");
  }

  exit(0);
}

上述代码运行结果可能为：

也可能为：

其实还有其他情况，但这两张图片已经够了：同样的多线程代码运行了两次出现了不同情况。究其原因其实是strtok函数搞的鬼，它维护一个全局的静态变量:

The point where the last token was found is kept internally by the function to be used on the next call (particular library implementations are not required to avoid data races).

对于我们的代码而言，我们有main函数作为主线程，fun函数作为子线程，两者如果并行运行，但是strtok函数的静态变量全局只有一份，那么这个值就会被覆盖为另一个的值。

我们模拟一下：main函数如果先执行，遇到分割符空格后，strtok停下来并返回'1'，此时如果fun再执行，char *s = strtok(buff, " ");这一行代码会导致重新分割(注意是重新分割，第一个参数不是NULL)，重新来过，切割下'a'，静态变量指向了'b'。如果主线程此时执行，那么就执行到了while循环中的 s = strtok(NULL, " ");这一行，它以空格为第一个参数，告诉strtok，你继续上次的位置，继续切割，可是程序没想到它被静态变量欺骗了，我刚说过：静态变量存储的是fun函数中（而不是main函数）上次分割结束的位置，所以main切割下来一个'b'，指向了'c'。以此类推，后续代码结果都可以模拟出来.............一直到 ' f '。特别地，如我们第二张图片所示，两个线程同时访问到了那个strtok维护的那个可恶的静态变量，都切割下来了 ' f '。

这个静态变量很可恶，所以我们应该怎么做？我们可以使用strtok_r函数：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>

void *fun(void *arg)
{
  char buff[128] = {"a b c d e f"};
  char *ptr = NULL;
  char *s = strtok_r(buff, " ", &ptr);
  while (s != NULL)
  {
    printf("fun s=%s\n", s);
    sleep(1);
    s = strtok_r(NULL, " ", &ptr);
  }
}
int main()
{
  pthread_t id;
  pthread_create(&id, NULL, fun, NULL);

  char buff[128] = {"1 2 3 4 5 6"};
  char *ptr = NULL;
  char *s = strtok_r(buff, " ", &ptr);
  while (s != NULL)
  {
    printf("main s=%s\n", s);
    sleep(1);
    s = strtok_r(NULL, " ", &ptr);
  }

  exit(0);
}

无论如何执行，结果如下：

它是线程安全的，它的第三个参数是一个二级指针，即：各自使用局部变量来记录自己已经切割到了哪里。

我们再来理一理strtok函数的另一个段代码：

#define _CRT_SECURE_NO_WARNINGS -1
#include<stdio.h>
#include<string.h>
int main() {
    char str[] = "Hello, World! Welcome to C programming.";
    const char delim[] = " ,!.";
    char* token;
    token = strtok(str, delim);
    while (token != NULL) {
        printf("token is:%s\n", token);
        token = strtok(NULL, delim);
    }
    printf("strtok之后的字符串为 %s\n", str);
}

这段代码的delim数组表示各种分隔符（空格逗号感叹号点），遇到这几个元素后他其实执行的不是跳过而是直接替换为'\0'，然后返回这个分隔符前面的字符串，并在静态变量上保存当前处理的位置，方便后续分割；在while循环中，我们%s打印了这个token，然后又调用了strtok函数，此时我们把NULL作为第一个参数传入，这告诉strtok，嘿！小伙子，继续从上一次分割的那里继续干活就好了！就这样......直到遍历完字符串，返回NULL，while条件为假的。跳出循环，结束了罪恶的一生；

但是，由于它的切割，我们的所有分割符全部被替换为'\0'，而%s打印是认准了'\0'这个标记的，所以最后一行代码打印hello