string详解（1）

烦躁的大鼻嘎

已于 2024-08-12 15:24:51 修改

阅读量890

点赞数 31

文章标签： c++ 算法开发语言 c语言

于 2024-08-11 11:08:13 首次发布

本文链接：https://blog.csdn.net/2401_83431652/article/details/140724790

版权

1.C语言中的字符串

C语言中，字符串是以'\0'结尾的一些字符的集合，为了操作方便，C标准库中提供了一些str系列的库函数，但是这些库函数与字符串是分离开的，不太符合OOP的思想，而且底层空间需要用户自己管理，稍不留神可能还会越界访问。

在OJ中，有关字符串的题目基本以string类的形式出现，而且在常规工作中，为了简单、方便、快捷，基本都使用string类，很少有人去使用C库中的字符串操作函数。

2. 标准库中string类的常见接口

(1) 简介

在使用string类时，必须包含#include头文件以及using namespace std;

字符串是表示字符序列（顺序表）的对象

(2) 成员函数

● string类对象的常见构造

● 析构函数。

当字符串生命周期结束，会自动调用析构函数。

int main()
{
	string s1;//构造空的string类对象s1
	string s2("123");//带参构造,用C格式字符串构造string类对象s2
	string s3(s2);//拷贝构造
    string s4(s2, 2, 3);//从下标为2的位置拷贝是3个字符
    string s5(s2, 2, 10);//超出拷贝字符串的范围，拷贝到原字符串结尾就结束
    string s6("123456", 4);//复制前4个
    string s7( 10 ,'x');//用10个'x'填充字符串(初始化)
	return 0;
}

● 赋值重载

(3) 迭代器

<1> 正向迭代器Iterator

提供了一种通用的（所用）访问容器的方式，比如访问字符串，链表等

//iterator类似于指针
string::iterator it = s1.begin();//begin返回开始位置的迭代器
while (it != s1.end())//end返回最后一个有效字符的下一个位置的迭代器
{
    it += 2;//可以修改
	cout << *it << " ";
	++it;
}
cout << endl;

● begin

● end

<2> 反向迭代器reverse_iterator

string s1("1234");
string::reverse_iterator rit = s1.rbegin();//返回反向迭代器以反向开始
while (rit != s1.rend())//将反向迭代器返回到反向端
{
	rit += 2;//可以修改
	cout << *rit << " ";
	++rit;
}
cout << endl;
return 0;

● rbegin

● rend

<3> 迭代器访问const类型容器

普通的迭代器是可读可写的，const迭代器访问const修饰的容器只能读不能写（指向的内容）

注意const迭代迭代器不是指const修饰迭代器，因为迭代器本身可以修改，而它指向的内容不能修改！

const string s2("1234");
string::const_iterator cit = s2.begin();//const迭代器
while (cit != s2.end())//只能读不能写(指向的内容)
{
	cout << *cit << " ";
	++cit;
}
cout << endl;

const string s3("1234");
string::const_reverse_iterator rit = s3.rbegin();//const迭代器
while (rit != s3.rend())//只能读不能写(指向的内容)
{
	cout << *rit << " ";
	++rit;
}
cout << endl;

(3) string类对象的常见容量操作

● string容量相关方法使用（扩容）：

std::string 的内部实现可能使用了一些优化技术，比如小字符串优化（SSO），在这种情况下，短字符串可能直接存储在 std::string 对象的内存空间内，而不是在堆上分配。但是，当字符串增长超出了这个内部空间时，就会触发堆上的内存分配和扩容。

当std::string对象通过连续的push_back操作增加元素时，其容量会根据元素数量增长，初始容量可能是15个字节，并且当size达到capacity时，容量会增加，通常是以一定的增量增长，如16个字节。在某些实现中，如果当前容量小于32，新的容量可能增加16；如果当前容量大于或等于32，新的容量可能是当前容量的1.5倍或者直接翻倍，以减少将来可能的内存重新分配次数，提高性能。（了解）

void TestPushBack()
{
	string s;
	size_t sz = s.capacity();
	cout << "making s grow:\n";
	for (int i = 0; i < 100; ++i)
	{
		s.push_back('c');
		if (sz != s.capacity())//扩容
		{
			sz = s.capacity();
			cout << "capacity changed: " << sz << '\n';
		}
	}
}

容量只显示有效字符的个数，内存会比容量多一个空间（存储‘ \0 ’）

● 注意：

1. size()与length()方法底层实现原理完全相同，引入size()的原因是为了与其他容器的接

口保持一致， 一般情况下基本都是用size() 。

2. clear()只是将string中有效字符清空，不改变底层空间大小。

3. resize(size_t n) 与 resize(size_t n, char c)都是 将字符串中有效字符个数改变到n个 ，不

同的是当字符个数增多时：resize(n)用0来填充多出的元素空间，resize(size_t n, char

c) 用字符c来填充多出的元素空间 。注意： resize在改变元素个数时，如果是将元素个数

增多，可能会改变底层容量的大小，如果是将元素个数减少，底层空间总大小不变。

4. reserve(size_t res_arg=0)：为string预留空间，不改变有效元素个数，当reserve的参

数小于string的底层空间总大小时，reserver不会改变容量大小。vs下，有效元素个数小于预留空间大小时，不缩容。

reserve扩容：
//预留100的容量，但会发生内存对齐，所以最终预留容量会大于100
s.reserve(100);

(4) 元素访问

● 运算符重载[ ]

底层模拟实现：

char& operator[](size_t pos)
{
	assert(pos < size);//越界会报错
	return _str[pos];
}

返回的是引用，方便我们进行读和修改的操作：

string s1("hello world");
cout << s1 << endl;
s1[0] = 'd';
cout << s1 << endl;
//下标 + []
for (size_t i = 0; i < s1.size(); i++)
{
	cout << s1[i] << " ";
}
cout << endl;

● at 获取字符串中的字符

返回对字符串中位置 pos 处的字符的引用。

该函数自动检查 pos 是否是字符串中字符的有效位置（即 pos 是否小于字符串长度），如果不是，则抛出 out_of_range 异常。

// string::at
#include <iostream>
#include <string>

int main ()
{
  std::string str ("Test string");
  for (unsigned i=0; i<str.length(); ++i)
  {
    std::cout << str.at(i);
  }
  return 0;
}

(5) 修改器

● push_back

string s1("hello world");
cout << s1 << endl;
s1.push_back('1');
cout << s1 << endl << endl;

● append

string s2;
s2.append(s1);
cout << s2 << endl;

● 赋值重载+=

s1 += "234";
cout << s1 << endl;

● insert（插入）

s2.insert(0, "123");
cout << s2 << endl;

● erase（删除）

string s3("hello world");
s3.erase(0,1);//头删
cout << s3 << endl;
s3.erase(6);//6位置之后的全部删除
cout << s3 << endl;

● replace（替换）

string s4("hello world");
s4.replace(5, 1, "00");
//将从第五个位置开始的一个字符替换成"00"
cout << s4 << endl;

replace替换有很多的风险，少替换多时，会有字符挪动，可能会有多次扩容，导致效率低下。在替换个数相同时，可以使用。

(6) 字符串修改

● c_str（兼容C）

string file;
cin >> file;//输入当前文件，例如Test.cpp
FILE* fout = fopen(file.c_str(), "r");//传指向文件底层的指针
char ch = fgetc(fout);//读取文件的字符串
while (ch != EOF)
{
	cout << ch;
	ch = fgetc(fout);
}
fclose(fout);

● data（类似于c_str）

● substr（子串）

返回一个新构造的字符串对象，其值初始化为从pos位置开始，长度为len（或直到字符串末尾，以先到者为准）的原字符串的拷贝

	string s1("test.cpp");
	size_t pos = s1.find(".");//返回找到的第一个.
	string subfix = s1.substr(pos);//取后缀，从pos位置开始取子字符串

● find（正向）

将所有“ ”替换成“%%” ：

// 将所有“ ”替换成“%%”
void test1()
{
	string s1("hello world");
	cout << s1 << endl;
	size_t pos = s1.find(" ");
	//效率较低
	while (pos != string::npos)
	{
		s1.replace(pos, 1, "%%");
		//将一个空格替换成%%(少替换多)
		pos = s1.find(" ",pos+2);
		//从pos+2的位置开始向后找
	}
	cout << s1 << endl;

	string tmp;
	//定义新的string类型，避免多次字符串多次移动，效率低下的问题
	for (auto ch : s1)
	{
		if (ch == ' ')
			tmp += "%%";
		else
			tmp += ch;
	}
	cout << tmp << endl;
	//s1 = tmp;//将结果直接赋值给s1
	s1.swap(tmp);//直接交换，tmp构造的空间直接交换给s1(更高效)
	cout << s1 << endl;
}

● rfind（倒着找）

题目：找出文件的后缀名

    // 题目：找出文件的后缀名
    string s1("test.cpp");
    size_t pos = s1.find(".");//返回找到的第一个.
    string subfix = s1.substr(pos);//取后缀，从pos位置开始取子字符串
    cout << subfix << endl;

    //这次要求返回的是.zip
    string s2("test.cpp.zip");
    size_t rpos = s2.rfind(".");
    string rsubfix = s2.substr(rpos);//取后缀，从pos位置开始取子字符串
    cout << rsubfix << endl;

● find_first_of

在字符串中搜索与其参数中指定的任何字符匹配的第一个字符。

	string str("Please, replace the vowels in this sentence by asterisks.");
	size_t found = str.find_first_of("aeiou");//在str中找到"aeiou"的任意一个都会返回下标
	while (found != string::npos)
	{
		str[found] = '*';
		found = str.find_first_of("aeiou", found + 1);
	}
	cout << str << '\n';

● find_last_of（倒着找）

在字符串中搜索与参数中指定的任何字符匹配的最后一个字符。

// 将路径和文件分隔开

//将路径和文件分隔开
void SplitFilename(const string& str)
{
	cout << "Splitting: " << str << '\n';
	size_t found = str.find_last_of("/\\");
	//无论包含包含的是linu下的分割还是Windows下的分割都进行下标返回
	cout << " path: " << str.substr(0, found) << '\n';
	cout << " file: " << str.substr(found + 1) << '\n';
}
int main()
{
	string str1("/usr/bin/man");//Linux下的分隔符是/
	string str2("c:\\windows\\winhelp.exe");//Windows下的分隔符是\，Windows下显示写\要写两个\\

	SplitFilename(str1);
	SplitFilename(str2);
	return 0;
}

● find_first_not_of

查找字符串中与参数中指定的任何字符都不匹配的第一个字符

● find_last_not_of

在字符串中搜索与参数中指定的任何字符都不匹配的最后一个字符

(7)成员常数npos

npos是一个静态成员常量值，对于size-t类型的元素，它是size_t的最大值。

(8)非成员函数

● 运算符重载+

返回一个新构造的字符串对象，其值是lhs中字符的连接，后跟rhs中的字符。

运算符重载要求至少有一个是类类型的参数

void test5()
{	
	string s1("hello");
	string s2 = s1 + "world";
	string s3 = "world" + s1;
	cout << s1 << endl;
	cout << s2 << endl;
	cout << s3 << endl;
}

● 比较运算符重载

在字符串对象lhs和rhs之间执行适当的比较操作

● getline（将线从流转换为字符串）

从is中提取字符并将其存储到str中，直到找到分隔符delim（或换行符'\n'，用于（2））

字符串最后一个单词的长度_牛客题霸_牛客网

#include <iostream>
using namespace std;

int main() {
   string str;
   //cin >> str;// cin 和 scanf 将空格和换行默认成分割
   getline(cin,str);// 无定界默认遇到换行才停止输入
   size_t pos = str.rfind(' ');
   cout << str.size() - (pos + 1) << endl;
}

eg：

getline(cin,str,"*");//流提取，直到遇到*才停止

3. auto和范围for

(1) auto关键字

● 在早期C/C++ 中 auto 的含义是：使用 auto 修饰的变量，是具有自动存储器的局部变量，后来这个 不重要了。 C++11 中，标准委员会变废为宝赋予了 auto 全新的含义即： auto 不再是一个存储类型 指示符，而是作为一个新的类型指示符来指示编译器， auto声明的变量必须由编译器在编译时期推导而得。

● 用auto 声明指针类型时，用 auto 和 auto*（必须是指针） 没有任何区别，但用 auto 声明引用类型时则必须加 & 当在同一行声明多个变量时，这些变量必须是相同的类型，否则编译器将会报错，因为编译器实际 只对第一个类型进行推导，然后用推导出来的类型定义其他变量 。

● auto 不能作为函数的参数，可以做返回值 ，但是建议谨慎使用

● auto 不能直接用来声明数组

#include<iostream>
#include <string>
#include <map>
using namespace std;
int main()
{
std::map<std::string, std::string> dict = { { "apple", "苹果" },{ "orange",
"橙子" }, {"pear","梨"} };
// auto的用武之地
//std::map<std::string, std::string>::iterator it = dict.begin();
auto it = dict.begin();
while (it != dict.end())
{
cout << it->first << ":" << it->second << endl;
++it;
}

#include<iostream>
using namespace std;
int func1()
{
return 10;
}
// 不能做参数
void func2(auto a)
{}
// 可以做返回值，但是建议谨慎使用
auto func3()
{
return 3;
}
int main()
{
int a = 10;
auto b = a;
auto c = 'a';
auto d = func1();
// 编译报错:rror C3531: “e”: 类型包含“auto”的符号必须具有初始值设定项
auto e;//无法从右边的类型推出auto的类型
cout << typeid(b).name() << endl;//打印类型
cout << typeid(c).name() << endl;
cout << typeid(d).name() << endl;
int x = 10;
auto y = &x;
auto* z = &x;
auto& m = x;
cout << typeid(x).name() << endl;
cout << typeid(y).name() << endl;
cout << typeid(z).name() << endl;
auto aa = 1, bb = 2;
// 编译报错：error C3538: 在声明符列表中，“auto”必须始终推导为同一类型
auto cc = 3, dd = 4.0;
// 编译报错：error C3318: “auto []”: 数组不能具有其中包含“auto”的元素类型
auto array[] = { 4, 5, 6 };
return 0;
}

(2) 范围for

● 对于一个 有范围的集合 而言，由程序员来说明循环的范围是多余的，有时候还会容易犯错误。因此C++11中引入了基于范围的for循环。 for循环后的括号由冒号“ ：”分为两部分：第一部分是范围内用于迭代的变量（每个字符的拷贝），第二部分则表示被迭代的范围， 自动迭代，自动取数据，自动判断结束。

● 范围for可以作用到 数组和容器对象 上进行遍历

● 范围for的底层很简单，容器遍历实际就是 替换为迭代器 ，这个从汇编层也可以看到。

//自动赋值，自动迭代，自动判断结束
//底层就是迭代器
// C++11的遍历
for (auto& e : array)
e *= 2;
for (auto e : array)
cout << e << " " << endl;

for (auto ch : s1)
{
    ch -= 2;//这里修改的只是字符串的拷贝ch(ch是局部变量)
	cout << ch << " ";
}
cout << endl;

for (auto& ch : s1)//ch就是s1里每个字符的别名
{
    ch -= 2;//修改了s1字符串
	cout << ch << " ";
}
cout << endl;

4.string类的常用接口说明

(1) string类对象的遍历访问操作

(2) string类对象的修改操作

1. 在string尾部追加字符时， s.push_back(c) / s.append(1, c) / s += 'c'三种的实现方式差

不多 ，一般情况下string类的+=操作用的比较多， +=操作不仅可以连接单个字符，还可

以连接字符串 。

2. 对string操作时，如果能够大概预估到放多少字符，可以先通过 reserve 把空间预留

好。