随笔——字符串函数

听风若依

已于 2024-06-01 00:15:15 修改

阅读量611

点赞数 14

文章标签： c#

于 2024-05-12 21:17:16 首次发布

本文链接：https://blog.csdn.net/venti0411/article/details/138679886

版权

最近浅学了字符串函数，考虑到目前用字符串函数的场景不多，特别写了这篇博客用来回顾，防止过一阵子全忘光。

这里写目录标题

字符分类函数
字符转化函数
字符串函数

字符分类函数

顾名思义，这类函数就是用来对字符进行分类的，它们都有一个共同的头文件：ctype.h

函数	如果输入的字符符合下列条件就返回真
iscntrl	任何控制字符
isspace	空白字符
isdigit	十进制数字0～9
isxdigit	十六进制数字，包括所有十进制数字，小写字母a～f，大写字母A～F
islower	小写字母a～z
isupper	大写字母A～Z
isalpha	字母a～z或A～Z
isalnum	字母或数字，a～z,A～Z,0～9
ispunct	标点符号，任何不属于数字或者字母的图形字符（可打印）
isgraph	任何图形字符
isprint	任何可打印字符，包括图形字符和空白字符

如果不符合条件就返回假。

这里以isupper为例：
写⼀个代码，将字符串中的大写字⺟转小写，其他字符不变。

#include<stdio.h>
#include<ctype.h>

int main()
{
	char str[] = "HeLlo WoRd";
	int i = 0;
	printf("转换前：%s\n", str);
	while (str[i])
	{
		if (isupper(str[i]))
		{
			str[i] += 32;
		}
		i++;
	}
	printf("转换后：%s\n", str);
	return 0;
}

字符转化函数

和字符分类函数一样，它们的头文件也是ctype.h，用来把大小写字符相互转化

函数	作用
tolower	若转入大写字母，则返回对应小写字母，若传入小写字母，则返回传入的小写字母
toupper	若转入小写字母，则返回对应大写字母，若传入大写字母，则返回传入的大写字母

有了字符转化函数上面的代码就可以这样写了：

#include<stdio.h>
#include<ctype.h>

int main()
{
	char str[] = "HeLlo WoRd";
	int i = 0;
	printf("转换前：%s\n", str);
	while (str[i])
	{
		if (isupper(str[i]))
		{
			str[i] = tolower(str[i]);
		}
		i++;
	}
	printf("转换后：%s\n", str);
	return 0;
}

考虑到本篇博客重点为字符串函数，所以对于上述的两大字符函数就不扩展了，如果想了解更多信息，可以访问C library和C/C++参考手册这两个网站。
你可以在C library中直接对上述函数进行搜索，也可以通过C/C++参考手册查看头文件ctype.h。

C library界面:

C/C++参考手册（1）：

C/C++参考手册（2）：

字符串函数

字符串函数的头文件都是string.h

strlen

size_t strlen ( const char * str );

字符串以 ‘\0’ 作为结束标志，strlen函数返回的是在字符串中 ‘\0’ 前⾯出现的字符个数（不包含 ‘\0’ )。
参数指向的字符串必须要以 ‘\0’ 结束。
注意函数的返回值为size_t，是⽆符号的（易错）
strlen的使⽤需要包含头⽂件

笔者刚开始学习C语言是就曾遇到这道题：
预测下列代码的运行结果：

#include<stdio.h>
#include<string.h>

int main()
{
	char* str1 = "hello word";
	char* str2 = "I am a student.";
	if (strlen(str1) - strlen(str2))
	{
		printf("字符串1比字符串2长");
	}
	else
	{
		printf("字符串2比字符串1长");
	}
	return 0;
}

A.字符串1比字符串2长
B.字符串2比字符串1长
题目大致是这个意思，我当初就直接选了B；
实际上由于syrlen返回的是size_t（⽆符号），除非两个字符串长度一样，否则strlen(str1) - strlen(str2)都是逻辑真。

strlen的模拟实现：

#include<stdio.h>
#include<assert.h>
#include<string.h>
 //会写三种方法：前两种会创建新变量

 //第一种方法：计数器
size_t my_strlen_1(const char* cource)
{
	assert(cource);
	int count = 0;
	while (*cource)
	{
		count++;
		cource++;
	}
	return (size_t)count;
}

//第二种方法：指针-指针
size_t my_strlen_2(const char* const cource)
{
	assert(cource);
	char* destination = cource;
	while (*destination)
	{
		destination++;
	}
	return (size_t)(destination - cource);
}

//第三种方法：递归
size_t my_strlen_3(const char* cource)
{
	if (*cource)
	{
		return 1 + my_strlen_3(++cource);
	}
	else
	{
		return 0;
	}
}

int main()
{
	char str[] = { "hello word" };
	printf("标准库：%zd\n", strlen(str));
	printf("%d:%zd\n", 1, my_strlen_1(str));
	printf("%d:%zd\n", 2, my_strlen_2(str));
	printf("%d:%zd\n", 3, my_strlen_3(str));
	return 0;
}

strcpy

char* strcpy(char * destination, const char * source );

源字符串必须以 ‘\0’ 结束。
会将源字符串中的 ‘\0’ 拷⻉到⽬标空间。
⽬标空间必须⾜够⼤，以确保能存放源字符串。
⽬标空间必须可修改。
返回目标字符串的起始地址

刚接触strcpy的时候我就有个现在看来很奇怪的想法：如果要使用strcpy，那这个目标字符串（严格来说应该是目标字符串的载体）必须是空的，我写这篇博客的时候已经不知道这个奇怪想法是怎么产生的了；因为strcpy会把源字符串中的 ‘\0’ 也拷⻉到⽬标空间，所以即使目标字符串（的载体）是有内容的，也不会有太大影响。
比如：

#include<stdio.h>
#include<string.h>

int main()
{
	char  str1[50] = "xxxxxxxxxxxxxxxxxxxxxx";
	char* str2 =     "I am a student.";
	strcpy(str1, str2);
	printf("%s\n", str1);
	return 0;
}

最后还是会打印"I am a student."，后面那些没改的x是不会打印出来的。
我们也可以用调试，看得更清楚：

随笔——字符串函数-关于strcpy

目标空间必须可修改可能有些不好理解，就我个人的理解，上述代码中的str1和str2虽然都可被称为字符串，但实际是不一样的，str1是字符数组，这个字符数组储存着字符串"xxxxxxxxxxxxxxxxxxxxxx"，str1只是这个字符串的载体，str1就像是一个杯子，其内容物是可以改变的，如果一个杯子之前装水，那现在也可以装可乐，（当然这个比喻也有不严谨的地方：一个本来装水的杯子如果现在要装可乐，那要先把水倒掉，再装可乐，当要把一个本来装着字符串"xxxxxxxxxxxxxxxxxxxxxx"的数组改成装字符串"I am a student."的数组，就没有"倒掉"这一步，而是直接修改原字符串的储存空间），str2就不一样了，它就是字符串本身，就是水或者可乐呀，而不是杯子，现实生活中也没有“把水直接变成可乐”这种说法吧；

至于const这种修饰那应该不用讲了吧：
const写在解引号前就是修饰指针指向的内容，写在解引号后就是修饰指针本身
比如：

		const char* p和char const* p就是修饰指针指向的内容
		char * const p就是修饰指针本身

strcpy的模拟实现：

#include<stdio.h>
#include<assert.h>
#include<string.h>

 //初代版
char* my_strcpy_0(char* destination, const char* source)
{
	assert(destination && source);
	char* ret = destination;
	while (*source)
	{
		*destination = *source;
		destination++;
		source++;
	}
	*destination = *source;
	return ret;
}

//一代优化
char* my_strcpy_1(char* destination, const char* source)
{
	assert(destination && source);
	char* ret = destination;
	do
	{
		*destination++ = *source++;

	} while (*(source - 1));
	return ret;
}

//仿标准库型
char* my_strcpy(char* destination, const char* source)
{
	assert(destination && source);
	char* ret = destination;
	while (*destination++ = *source++)
	{
		;
	}

	return ret;
}

int main()
{
	char to[] = { "chnwsduihjfgsadygfuyshui" };
	char* go = "hello word";
	char* rs = "chnwsduihjfgsadygfuyshui";
	printf("标准库：%s\n", strcpy(to, go));
	strcpy(to, rs);
	printf("初代版：%s\n", my_strcpy_0(to, go));
	strcpy(to, rs);
	printf("优化版：%s\n", my_strcpy_1(to, go));
	strcpy(to, rs);
	printf("仿标准库版：%s\n", my_strcpy(to, go));
	strcpy(to, rs);

	return 0;
}

strcat

char * strcat ( char * destination, const char * source );

源字符串必须以 ‘\0’ 结束。
⽬标字符串中也得有 \0 ，否则没办法知道追加从哪⾥开始。
⽬标空间必须有⾜够的⼤，能容纳下源字符串的内容。
⽬标空间必须可修改。

注意：strcat不会把源字符串的’\0’也拷贝过来：
让我们来看调试：

随笔——字符串函数-关于strcat

strcat的模拟实现

#include<stdio.h>
#include<assert.h>
#include<string.h>

char* my_strcat(char* destination, const char* source)
{
	assert(destination && source);
	char* ret = destination;
	while (*destination)
	{
		destination++;
	}
	while (*destination++ = *source++)
	{
		;
	}
	return ret;
}

int main()
{
	char to[50] = "hello ";
	char* go = "word";
	char* rs = "hello ";
	printf("%s\n", strcat(to, go));
	strcpy(to, rs);
	printf("%s\n", my_strcat(to, go));
	strcpy(to, rs);
	return 0;
}

strcmp

int strcmp ( const char * str1, const char * str2 );

第⼀个字符串⼤于第⼆个字符串，则返回⼤于0的数字
第⼀个字符串等于第⼆个字符串，则返回0
第⼀个字符串⼩于第⼆个字符串，则返回⼩于0的数字

VS 2022上的strcmp与C标准中的strcmp不同，VS 2022上的strcmp是第⼀个字符串⼤于第⼆个字符串，则返回1，第⼀个字符串等于第⼆个字符串，则返回0，第⼀个字符串⼩于第⼆个字符串，则返回-1。

strcmp的模拟实现

#include<stdio.h>
#include<assert.h>
#include<string.h>

int my_strcmp(const char* str1, const char* str2)
{
	assert(str1 && str2);
	while (*str1 == *str2)
	{
		if (*str1 == '\0')
			return 0;
		str1++;
		str2++;
	}
	return *str1 - *str2;
}

int main()
{
	char* str1 = "abcdef";
	char* str2 = "abcdfe";
	if (my_strcmp(str1, str2) > 0)
	{
		printf("字符串1比字符串2大\n");
	}
	else
	{
		printf("字符串2比字符串1大\n");
	}
	return 0;
}

如果你在VS上使用strcpy,strcat,strcmp,VS会报警的，和scanf报错类似，VS认为它们都是不安全的，因为它们都不会判断目标空间是否足够大以存储输入的信息，容易造成越权访问，所以就有了strncpy,strncat,strncmp，相比strcpy,strcat,strcmp，strncpy,strncat,strncmp相对安全，因为它们多了一个参数，要求你人为规定输入的字符串长度，减少了越权访问的可能性。

strncpy

char * strncpy ( char * destination, const char * source, size_t num );

拷⻉num个字符从源字符串到⽬标空间
如果源字符串的⻓度⼩于num，则拷⻉完源字符串之后，在⽬标的后边追加ASCII的0（\0），直到num个
如果源字符串的⻓度大于等于num，则拷⻉完源字符串之后，不会在⽬标的后边追加一个\0

看看调试，效果更明显：

随笔——字符串函数——关于strncpy

strncpy的模拟实现
第一版：

#include<stdio.h>
#include<assert.h>
#include<string.h>


char* my_strncpy(char* destination, const char* source, size_t num)
{
	assert(destination && source);
	char* ret = destination;
	size_t len = strlen(source);
	if (len < num)
	{
		while (*destination++ = *source++)
		{
			;
		}
		size_t i = 0;
		for (; i < num - len - 1; i++)
		{
			*destination++ = 0;
		}
	}
	else
	{
		while (num--)
		{
			*destination++ = *source++;
		}
	}
	return ret;
}

int main()
{
	char arr[50] = "xxxxxxxxxxxxxxxxxxxxxxx";
	char* str = "hello word";
	size_t n = strlen(str);
	my_strncpy(arr, str, n - 2);
	printf("%s\n", arr);
	return 0;
}

第二版：

#include<stdio.h>
#include<assert.h>
#include<string.h>
char* MyStrncpy(char* destination, const char* source, size_t num)
{
	assert(destination && source);
	char* ret = destination;
	size_t i = 0;
	for (; source[i] != '\0' && i < num; i++)
	{
		destination[i] = source[i];
	}
	if (num > i)
	{
		for (; i < num; i++)
		{
			destination[i] = 0;
		}
	}
	return ret;
}


int main()
{
	char arr[50] = "xxxxxxxxxxxxxxxxxxxxxxx";
	char* str = "hello word";
	size_t n = strlen(str);
	MyStrncpy(arr, str, n);
	printf("%s\n", arr);
	return 0;
}

strncat

char * strncat ( char * destination, const char * source, size_t num );

将source指向字符串的前num个字符追加到destination指向的字符串末尾，再追加⼀个 \0 字
符
如果source指向的字符串的⻓度⼩于num的时候，只会将字符串中到\0 的内容追加到destination指向的字符串末尾

还是看看调试：

随笔——字符串函数——关于strncat

strncat的模拟实现

#include<stdio.h>
#include<assert.h>
#include<string.h>

char* my_strncat(char* destination, const char* source, size_t num)
{
	assert(destination && source);
	char* ret = destination;
	while (*destination)
	{
		destination++;
	}
	size_t len = strlen(source);
	if (len > num)
	{
		while (num--)
		{
			*destination++ = *source++;
		}
		*destination = 0;
	}
	else
	{
		while (*destination++ = *source++)
		{
			;
		}
	}

	return ret;
}

int main()
{
	char arr[100] = "xxxxxxxxxxx\0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
	char* str = "Hope Is the Thing With Feathers";
	size_t n = strlen(str);
	my_strncat(arr, str, n + 2);
	printf("%s\n", arr);
	return 0;
}

strncmp

int strncmp ( const char * str1, const char * str2, size_t num );

⽐较str1和str2的前num个字符，如果相等就继续往后⽐较，最多⽐较num个字⺟，如果提前发现不⼀样，就提前结束，⼤的字符所在的字符串⼤于另外⼀个。如果num个字符都相等，就是相等返回0。

strstr

char * strstr ( const char * str1, const char * str2);

用于在字符串str1中查找字符串str2

如果*str2是’\0’，则返回str1
如果能找到，就返回字符串str2在字符串str1中第⼀次出现位置的地址
#如果找不到，就返回NULL

strstr的模拟实现：

#include<stdio.h>
#include<assert.h>
#include<string.h>

char* my_strstr(const char* str1, const char* str2)
{
	assert(str1 && str2);
	if (*str2 == '\0')
	{
		return str1;
	}
	char* s1 = NULL;
	char* s2 = NULL;
	while (*str1)
	{
		s1 = str1;
		s2 = str2;
		while (*str1 == *str2 && *str2 != '\0')
		{
			str1++;
			str2++;
		}
		if (*str1 != *str2 && *str2 != '\0')
		{
			str1 = s1;
			str2 = s2;
			str1++;
		}
		if (*str2 == '\0')
		{
			return s1;
		}

	}
	return NULL;
}

int main()
{
	char* str1 = "hello word";
	char* str2 = "llo w";
	printf("%s\n", my_strstr(str1, str2));
	return 0;
}

strtok

char * strtok ( char * str, const char * sep);

strtok比较抽象，实际用的也不多

sep参数指向⼀个字符串，定义了⽤作分隔符的字符集合
第⼀个参数指定⼀个字符串，它包含了0个或者多个由sep字符串中⼀个或者多个分隔符分割的标记。
strtok函数找到str中的下⼀个标记，并将其⽤ \0 结尾，返回⼀个指向这个标记的指针。（注：strtok函数会改变被操作的字符串，所以在使⽤strtok函数切分的字符串⼀般都是临时拷⻉的内容并且可修改。）
strtok函数的第⼀个参数不为 NULL ，函数将找到str中第⼀个标记，strtok函数将保存它在字符串中的位置。
strtok函数的第⼀个参数为 NULL ，函数将在同⼀个字符串中被保存的位置开始，查找下⼀个标记。
如果字符串中不存在更多的标记，则返回 NULL 指针。

这样说可能还是不好理解，我个人的理解是strtok用于将一个字符串分割成不同的部分，比如对于字符串wind3344764904@gmail.com（这是我随便造的gmail邮箱，不是笔者的邮箱）来说，这个字符串被分成了三部分：第一部分wind3344764904,第二部分gmail，第三部分就是com,其中字符’@‘和字符"."就充当分隔符的角色。
strtok第一次调用是strtok(str, sep);str告诉函数要对那个字符串进行分割，sep告诉函数，到底那些字符才被算作分隔符，之后strtok会逐个遍历字符串str，只要找到一个分隔符，就会把这个分隔符修改为’\0’,并会用一个静态变量记住这个分隔符的地址，方便下次调用，然后返回这部分的起始地址；第二次调用第一个参数就不用再输入str了，而是NULL，这样strtok就会把之前那个储存的静态变量拿出来用作起始地址再往后遍历，直到再找到一个分隔符，把这个分隔符修改为’\0’，并把这个分隔符的地址赋给静态变量，返回第二部分的起始地址，依次类推，如果想换个字符串进行分割，第一个参数就输入另一个字符串，如果字符串中已经没有分隔符了就会返回NULL
以下是示例代码：

#include<stdio.h>
#include<string.h>

int main()
{
	char* str = "wind3344764904@gmail.com";
	char* sep = "@.";
	char arr[50] = "\0";
	char* store = NULL;
	size_t num = strlen(str);
	strncpy(arr, str, num);
	for (store = strtok(arr, sep); store != NULL; store = strtok(NULL, sep))
	{
		printf("%s\n", store);
	}
	return 0;
}

调试控制台的结果：

strerror

char * strerror ( int errnum );

strerror函数可以把参数部分错误码对应的错误信息的字符串地址返回来。
在不同的系统和C语⾔标准库的实现中都规定了⼀些错误码，⼀般是放在 errno.h 这个头⽂件中说明的，你可以用Everything来看，C语⾔程序启动的时候就会使⽤⼀个全⾯的变量errno来记录程序的当前错误码，只不过程序启动的时候errno是0，表⽰没有错误，当我们在使⽤标准库中的函数的时候发⽣了某种错误，就会讲对应的错误码，存放在errno中，⽽⼀个错误码的数字是整数很难理解是什么意思，所以每⼀个错误码都是有对应的错误信息的。strerror函数就可以将错误对应的错误信息字符串的地址返回。

#include<stdio.h>
#include<string.h>
#include<errno.h>

int main()
{
	//foprn以读的形式（r）打开文件，如果项目路径下文件不存在，就打开失败
	FILE* pf = fopen("main.txt", "r");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));
		return 1;
	}
	return 0;
}

调试控制台的结果:

No such file or directory:没有这样的文件或目录
除此之外还有一个函数叫做perror,头文件是stdio.h，它集成了printf和strerror，就一个参数，这个参数是个字符串，用于对这个错误信息进行标记，防止有多个错误信息存在时无法一一对应，代码如下：

#include<stdio.h>
#include<string.h>
#include<errno.h>

int main()
{
	//foprn以读的形式（r）打开文件，如果项目路径下文件不存在，就打开失败
	FILE* pf = fopen("main.txt", "r");
	if (pf == NULL)
	{
		perror("第一处");
		return 1;
	}
	return 0;
}