《C和指针》阅读笔记(8)---字符串处理合集

本文链接：https://blog.csdn.net/weixin_43708622/article/details/109826624

本文继续总结《C和指针》第9章的内容，本章的主要内容是字符串，讲解使用标准库函数对字符串进行处理。

文章目录

字符串
字符串长度
字符串复制
字符串连接
字符串比较
字符串查找
关注我

字符串

字符串就是零个或多个字符的序列，该序列以一个NUL字节结尾。NUL字节是一个所有bit都为0的特殊字节。NUL字节也叫做"尾零"。

字符串所包含的字符内部不能出现NUL字节，这个限制很少会引起问题，因为NUL字节并不存在与它相关联的可打印字符，这也是它被选为字符串终止符的原因。NUL字节是字符串的终止符，但它本身并不是字符串的一部分，所以字符串的长度并不包括NUL字节。

c标准库提供了一些字符串处理函数，通常并不需要我们自己去编写字符串处理函数，学会使用这些标准库中的函数就好了。接下来，逐一介绍一些常用的字符串处理函数。

另外，在使用标准库字符串处理函数的源文件中，需要include string.h头文件，该头文件中包含了字符串处理函数的原型和声明。

字符串长度

#include <string.h>

size_t strlen(const char *s);

strlen()计算字符串的长度，不包含NUL字节。strlen()返回一个size_t类型的值，这个类型在头文件stddef.h中定义，它是一个无符号长整型(long unsigned int)。

size_t len = strlen("hello");
printf("len=%lu\n", len);
// 输出打印为: len=5

字符串复制

#include <string.h>

char *strcpy(char *dest, const char *src);

char *strncpy(char *dest, const char *src, size_t n);

将src中的字符串拷贝到dest(包含NUL字节)，注意dest的存储空间要足够大，使得能容纳要复制的字符串。通常在工程中，使用strncpy()，因为它显示的指定了支持最多拷贝的字符串的长度n。

使用strncpy需要说明的是：

如果str的实际长度小于n，那么dst就用NUL字节填充使其长度为n；

如果str的实际长度大于等于n，那么前n的字节没有NUL字节，拷贝到dest的n个字节就没有字符串终止符(NUL字节)。

char buf[8] = {0};
char *p_dst = NULL;
p_dst = strncpy(buf, "hello", 8);// src的实际长度是５，buf足够存放src
printf("p_dst=%s\n", p_dst);
printf("address of p_dst:%p\n", p_dst);
printf("address of buf  :%p\n", &buf[0]);

p_dst = strncpy(buf, "123456789a", 8); // src的实际长度大于8，那么buf中将没有NUL字节
buf[7] = '\0'; //保险起见，通常会将buf的最后一个字节赋值为NUL字节
printf("2->p_dst=%s\n", p_dst);
printf("2->address of p_dst:%p\n", p_dst);

字符串连接

#include <string.h>

char *strcat(char *dest, const char *src);

char *strncat(char *dest, const char *src, size_t n);

cat就是concatenate单词中的cat。将src中字符串追加到dest中，src将覆盖dest中原先的NUL字节，将src追加后，再在dest字符串的末尾添加一个NUL字节。dest的存储空间必须要足够大，足以容纳src，否则结果是未定义的。

// buf空间足够的情况
char buf[16] = {0};
strcpy(buf, "hello ");
printf("1->buf:%s, len of buf:%lu\n", buf, strlen(buf));
strncat(buf, "yudao", 5);
printf("2->buf:%s, len of buf:%lu\n", buf, strlen(buf));

// buf空间不够的情况
char buf1[8] = {0};
strcpy(buf1, "hello ");
printf("3->buf:%s, len of buf1:%lu\n", buf1, strlen(buf1));
strncat(buf1, "yudao", 5);
printf("4->buf:%s, len of buf1:%lu\n", buf1, strlen(buf1));

// 执行结果打印
1->buf:hello , len of buf:6
2->buf:hello yudao, len of buf:11
3->buf:hello , len of buf:6
4->buf:hello yudao, len of buf:11 // 显然buf1的长度超过了buf1数组定义时的长度，虽然程序没有报错，但实际上，buf1中的字符串已经溢出，这是非常不安全的。

通常情况下，字符串连接函数的使用，需要程序员特别小心的处理好dest的容量，使其不溢出。

字符串比较

#include <string.h>

int strcmp(const char *s1, const char *s2);

int strncmp(const char *s1, const char *s2, size_t n);

比较字符串s1和字符串s2，如果s1小于s2，返回一个小于0的值；如果s1等于s2，返回0；如果s1大于s2，返回一个大于0的值。两个字符串的比较是指字符串对应的字符逐个比较，直到发现不匹配的为止。不匹配有两种情况，一种是大于，一种是小于。最先不匹配的字符中值较小的字符所在的字符串被认为是小于另一个字符串，大于就是相反的过程。strncmp就是只比较字符串的前n个字节(最后n个字节，可能某个字符串的长度小于n)。

char *s1 = "hello";
char *s2 = "hello";
char *s3 = "helloo";
char *s4 = "hell";
int ret = -1;

ret = strcmp(s1, s2);
printf("%s == %s, comparing result:%d\n", s1, s2, ret);

ret = strcmp(s1, s3);
printf("%s < %s, comparing result:%d\n", s1, s3, ret);

ret = strcmp(s1, s4);
printf("%s > %s, comparing result:%d\n", s1, s4, ret);

ret = strncmp(s1, s2, strlen(s1));
printf("%s == %s, comparing %lu bytes, result:%d\n", s1, s2, strlen(s1), ret);

ret = strncmp(s1, s3, strlen(s1));
printf("%s == %s, comparing %lu bytes, result:%d\n", s1, s3, strlen(s1), ret);

ret = strncmp(s1, s4, strlen(s4));
printf("%s == %s, comparing %lu bytes, result:%d\n", s1, s4, strlen(s4), ret);

输出结果为：

hello == hello, comparing result:0
hello < helloo, comparing result:-111
hello > hell, comparing result:111
hello == hello, comparing 5 bytes, result:0
hello == helloo, comparing 5 bytes, result:0
hello == hell, comparing 4 bytes, result:0

字符串查找

查找某个字符

#include <string.h>

char *strchr(const char *s, int c);

char *strrchr(const char *s, int c);

查找字符串s中是否有字符c，并返回匹配的字符所在的地址，如果找的话；否则，返回NULL指针。

strchr是从左往右找，strrchr是从右往左找。

char *s = "hello, world";
char *p = NULL;

p = strchr(s, 'o');
printf("匹配的字符的偏移：%ld, %s\n", p-s, p);

p = strrchr(s, 'o');
printf("匹配的字符的偏移：%ld, %s\n", p-s, p);

执行的结果为：

匹配的字符的偏移：4, o, world
匹配的字符的偏移：8, orld

查找字符集合中任意字符

#include <string.h>

char *strpbrk(const char *s, const char *accept);

查找一组字符(accept)第一次在字符串(s)中出现的位置。

// 查找字符集合中的任意字符
const char *accept = "abc";
char *s = "hello";
char *p = NULL;

p = strpbrk(s, accept);
printf("匹配的字符： %s\n", p); // 输出结果为：匹配的字符： (null)
// 字符串s中没有字符集合accept中的字符，所以p为Null

p = strpbrk(s, "abcde");
printf("匹配的字符： %s\n", p);　// 输出结果为：匹配的字符： ello
// 字符串s中有字符e在字符集合"abcde"中，所以p指向e的地址。

char *s1 = "hello-hello";
p = strpbrk(s1, "abcde");
printf("匹配的字符： %s\n", p);// 输出结果为：匹配的字符： ello-hello
// p的地址说明了strpbrk只匹配第一次出现字符集合中的字符

查找子字符串

#include <string.h>

char *strstr(const char *haystack, const char *needle);
// 在haystack字符串中查找子串needle第１次出现的起始位置，并返回一个指向该位置的指针。

char *strcasestr(const char *haystack, const char *needle);
// strcasestr的作用类似于strstr，只不过忽略大小写

const char *s1 = "hello,world";
const char *s2 = "old";
const char *s3 = "world";

char *p = NULL;

p = strstr(s1, s2);
printf("匹配子串的结果:%s\n", p); //匹配子串的结果:(null)

p = strstr(s1, s3);
printf("匹配子串的结果:%s\n", p); // 匹配子串的结果:world

获取字符串前缀的长度

#include <string.h>

size_t strspn(const char *s, const char *accept);
// 对字符串s起始部分中与accept中的任何字符匹配的字符进行计数

size_t strcspn(const char *s, const char *reject);
// 对字符串s起始部分中不与reject中的任何字符匹配的字符进行计数

// 获取字符串前缀的长度
int len1 = 0;
int len2 = 0;
char buffer[] = "25,142,330,Smith,J,239-4123";

len1 = strspn(buffer, "0123456789"); // 从buffer的第一个字符开始搜索,直到buffer中的某个字符不在"0123456789"中,那么,搜索结束,计算之前匹配的字符的个数
// 此例中,buffer搜索到第三个字符','时,发现"0123456789"中没有与之匹配的字符,结束搜索,计算匹配的子字符串("25")的长度为2
printf("len1=%d\n", len1); // len1=2

len2 = strspn(buffer, ",0123456789");
// 此例中,buffer搜索到字符'S'时，才结束搜索,计算匹配的子字符串("25,142,330,")的长度为11
printf("len2=%d\n", len2); // len2=11


// 在某些文本处理的场景中,
// 想要显示每行非空白字符的文本(有些行可能前面是空白字符),就可以使用strspn()函数
char line1[] = "    渔道的博客";
const char* accept1 = "\n\r \t";
int l1 = 0;
l1 = strspn(line1, accept1);
printf("原始行数据:%s\n", line1);		// 原始行数据:    渔道的博客
printf("处理后行数据:%s\n", line1+l1);	// 处理后行数据:渔道的博客

// 过滤指定前缀
const char *prefix = "pre_";
const char *filename1 = "pre_yudao.jpg";
printf("过滤前缀后的文件名:%s\n", filename1+strspn(filename1, prefix)); // 过滤前缀后的文件名:yudao.jpg

// 在做数据标注时，有时需要对名字进行统一处理                                                                                                       
const char *prefix2 = "pre_";
const char *filename2 = "xxpre_yudao.jpg"; 
printf("过滤非法前缀的文件名:%ld\n", strcspn(filename2, prefix2)); // 过滤非法前缀的文件名:2
printf("过滤非法前缀的文件名:%s\n", filename2+strcspn(filename2, prefix2)); // 过滤非法前缀的文件名:pre_yudao.jpg

查找指定标记

有时候在处理某些特定字符间隔的文本时，需要提取出由间隔符分隔的部分(也叫字段或token)，这时候strtok()就能很好的完成这项任务。

#include <string.h>

char *strtok(char *str, const char *delim);

strtok在使用上比其他标准库的字符串处理函数要复杂一点，接下来会通过具体的示例来解释。

str不能是字符串常量

因为在strtok执行时，会修改str的内容，如果str是一个字符串常量，就会发生段错误。

// 错误的示例
const char *delim = ","; // 定义分隔符为逗号
char *str = "渔道,专注,技术,经验,分享";　// 定义str为字符串常量
char *p = NULL;

p = strtok(str, delim);
printf("token:%s\n", p);

// 指向上面的代码就会产生段错误
// Segmentation fault (core dumped)

// 正确的示例
const char *delim = ",";
// char *str = "渔道,专注,技术,经验,分享"; // 错误的示例
char str[] = "渔道,专注,技术,经验,分享"; // 正确的示例

char *p = NULL;

p = strtok(str, delim);
printf("token:%s\n", p); // token:渔道
printf("next str:%s\n", p+strlen(p)+1); // next str:专注,技术,经验,分享

从上面的示例我们可以发现，strtok将原字符串中的分隔符替换成了NUL字节。但是这个例子我们只调用了一次strtok，要想将分隔符分隔的token都取出来，就需要循环调用strtok，直至strtok返回NULL指针。

strtok循环获取token

const char *delim = ",";
char str[] = "渔道,专注,技术,经验,分享"; // 正确的示例

char *p = NULL;

p = strtok(str, delim);
printf("token:%s\n", p);
printf("next str:%s\n", p+strlen(p)+1);


while(p = strtok(NULL, delim))                          
    printf("token:%s\n", p);

/*运行结果为：
token:渔道
next str:专注,技术,经验,分享
token:专注
token:技术
token:经验
token:分享
*/

在循环调用strtok时，除了第一次调用需要指定原字符串外，后面的调用，第一个参数给NULL即可。

strtok_r可重入

另外说明一下，最终原字符数组str的内存中的值变成了渔道\0专注\0技术\0经验\0分享，分隔符全都被替换成NUL字节了。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char** argv)
{
    const char *delim = ",";
    char str[] = "渔道,专注,技术,经验,分享";

    char *p = NULL;
    char *saveptr = NULL;

    p = strtok_r(str, delim, &saveptr);
    printf("token:%s\n", p);
    printf("next str:%s\n", p+strlen(p)+1);
    printf("savestr:%s\n", saveptr);


    // while(p = strtok_r(NULL, delim, &saveptr))
    while(p = strtok_r(saveptr, delim, &saveptr)) // 这两种方式都可以，下面的这种方式体现了strtok_r的可重入性
    {
        printf("token:%s\n", p);
        printf("savestr:%s\n", saveptr);
    }

    return 0;
}

saveptr保存了下一个token的地址。

综上所述，strtok函数内部保存(记录)了原字符串的数据，只能一次解析一个字符串，不能同时解析两个字符串。如果在while循环中，假设还调用了某个函数，而这个函数内部也调用了strtok，这时程序结果将会变得不是我们原本以为的那样。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void str_parse(char *str)
{
    char *p = NULL;
    const char *delim = ",";

    p = strtok(str, delim);
    printf("token:%s\n", p);
    while(p = strtok(NULL, delim))                              
        printf("token:%s\n", p);

}

int main(int argc, char** argv)
{
    const char *delim = ",";
    char str[] = "渔道,专注,技术,经验,分享";
    char str2[] = "2渔道,2专注,2技术,2经验,2分享";

    char *p = NULL;

    p = strtok(str, delim);
    printf("token:%s\n", p);
    printf("next str:%s\n", p+strlen(p)+1);


    while(p = strtok(NULL, delim))
    {
        printf("token:%s\n", p);
        str_parse(str2);
    }
    

    return 0;
}   


/* 运行结果为：
token:渔道
next str:专注,技术,经验,分享
token:专注
token:2渔道
token:2专注
token:2技术
token:2经验
token:2分享
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void str_parse_r(char *str)
{
    char *p = NULL;
    const char *delim = ",";
    char *saveptr = NULL;

    p = strtok_r(str, delim, &saveptr);
    printf("2token:%s\n", p);
    while(p = strtok_r(saveptr, delim, &saveptr))
        printf("2token:%s\n", p);
}

int main(int argc, char** argv)
{
    const char *delim = ",";
    char str[] = "渔道,专注,技术,经验,分享";

    char *p = NULL;
    char *saveptr = NULL;

    p = strtok_r(str, delim, &saveptr);
    printf("token:%s\n", p);
    printf("next str:%s\n", p+strlen(p)+1);
    printf("savestr:%s\n", saveptr);


    while(p = strtok_r(saveptr, delim, &saveptr))
    {
        printf("token:%s\n", p);
        printf("savestr:%s\n", saveptr);
        str_parse_r(saveptr);
    }

    return 0;
}
/* 运行结果为：
token:渔道
next str:专注,技术,经验,分享
savestr:专注,技术,经验,分享
token:专注
savestr:技术,经验,分享
2token:技术
2token:经验
2token:分享
token:技术
savestr:
2token:(null)
*/