stl string and stream

最新推荐文章于 2024-07-21 16:42:16 发布

mindva

最新推荐文章于 2024-07-21 16:42:16 发布

阅读量1k

点赞数

文章标签： string stream iterator vector token list

本文链接：https://blog.csdn.net/mindva/article/details/2669744

版权

toupper,tolower 地球人都知道 C++ 的 string 没有 toupper ，好在这不是个大问题，因为我们有 STL 算法： string s("heLLo"); transform(s.begin(), s.end(), s.begin(), ::toupper); cout << s << endl; transform(s.begin(), s.end(), s.begin(), ::tolower); cout << s << endl; 当然，我知道很多人希望的是 s.to_upper() ，但是对于一个这么通用的 basic_string 来说，的确没办法把这些专有的方法放进来。如果你用 boost stringalgo ，那当然不在话下，你也就不需要读这篇文章了。 ------------------------------------------------------------------------ trim 我们还知道 string 没有 trim ，不过自力更生也不困难，比 toupper 来的还要简单： string s(" hello "); s.erase(0, s.find_first_not_of(" /n")); cout << s << endl; s.erase(s.find_last_not_of(' ') + 1); cout << s << endl; 注意由于 find_first_not_of 和 find_last_not_of 都可以接受字符串，这个时候它们寻找该字符串中所有字符的 absence ，所以你可以一次 trim 掉多种字符。 ----------------------------------------------------------------------- erase string 本身的 erase 还是不错的，但是只能 erase 连续字符，如果要拿掉一个字符串里面所有的某个字符呢？用 STL 的 erase + remove_if 就可以了，注意光 remove_if 是不行的。 string s(" hello, world. say bye "); s.erase(remove_if(s.begin(),s.end(), bind2nd(equal_to (), ' ')), s.end()); 上面的这段会拿掉所有的空格，于是得到 hello,world.saybye。 ----------------------------------------------------------------------- replace string 本身提供了 replace ，不过并不是面向字符串的，譬如我们最常用的把一个 substr 换成另一个 substr 的操作，就要做一点小组合： string s("hello, world"); string sub("ello, "); s.replace(s.find(sub), sub.size(), "appy "); cout << s << endl; 输出为 happy world。注意原来的那个 substr 和替换的 substr 并不一定要一样长。 ----------------------------------------------------------------------- startwith, endwith 这两个可真常用，不过如果你仔细看看 string 的接口，就会发现其实没必要专门提供这两个方法，已经有的接口可以干得很好： string s("hello, world"); string head("hello"); string tail("ld"); bool startwith = s.compare(0, head.size(), head) == 0; cout << boolalpha << startwith << endl; bool endwith = s.compare(s.size() - tail.size(), tail.size(), tail) == 0; cout << boolalpha << endwith << endl; 当然了，没有 s.startwith("hello") 这样方便。 ------------------------------------------------------------------------ toint, todouble, tobool... 这也是老生常谈了，无论是 C 的方法还是 C++ 的方法都可以，各有特色： string s("123"); int i = atoi(s.c_str()); cout << i << endl; int ii; stringstream(s) >> ii; cout << ii << endl; string sd("12.3"); double d = atof(sd.c_str()); cout << d << endl; double dd; stringstream(sd) >> dd; cout << dd << endl; string sb("true"); bool b; stringstream(sb) >> boolalpha >> b; cout << boolalpha << b << endl; C 的方法很简洁，而且赋值与转换在一句里面完成，而 C++ 的方法很通用。 ------------------------------------------------------------------------ split 这可是件麻烦事，我们最希望的是这样一个接口： s.split(vect, ',') 。用 STL 算法来做有一定难度，我们可以从简单的开始，如果分隔符是空格、tab 和回车之类，那么这样就够了： string s("hello world, bye."); vector vect; vect.assign( istream_iterator (stringstream(s)), istream_iterator () ); 不过要注意，如果 s 很大，那么会有效率上的隐忧，因为 stringstream 会 copy 一份 string 给自己用。 ------------------------------------------------------------------------ concat 把一个装有 string 的容器里面所有的 string 连接起来，怎么做？希望你不要说是 hand code 循环，这样做不是更好？ vector vect; vect.push_back("hello"); vect.push_back(", "); vect.push_back("world"); cout << accumulate(vect.begin(), vect.end(), string("")); 不过在效率上比较有优化余地。 ------------------------------------------------------------------------- reverse 其实我比较怀疑有什么人需要真的去 reverse 一个 string ，不过做这件事情的确是很容易： std::reverse(s.begin(), s.end()); 上面是原地反转的方法，如果需要反转到别的 string 里面，一样简单： s1.assign(s.rbegin(), s.rend()); 效率也相当理想。 ------------------------------------------------------------------------- 解析文件扩展名字数多点的写法： std::string filename("hello.exe"); std::string::size_type pos = filename.rfind('.'); std::string ext = filename.substr(pos == std::string::npos ? filename.length() : pos + 1); 不过两行，合并成一行呢？也不是不可以： std::string ext = filename.substr(filename.rfind('.') == std::string::npos ? filename.length() : filename.rfind('.') + 1); 我知道，rfind 执行了两次。不过第一，你可以希望编译器把它优化掉，其次，扩展名一般都很短，即便多执行一次，区别应该是相当微小。 STL 算法 distance 很多时候我们希望在一个 vector ，或者 list ，或者什么其他东西里面，找到一个值在哪个位置，这个时候 find 帮不上忙，而有人就转而求助手写循环了，而且是原始的手写循环： for ( int i = 0; i < vect.size(); ++i) if ( vect[i] == value ) break; 如果编译器把 i 看作 for scope 的一部分，你还要把 i 的声明拿出去。真的需要这样么？看看这个： int dist = distance(col.begin(), find(col.begin(), col.end(), 5)); 其中 col 可以是很多容器，list, vector, deque... 当然这是你确定 5 就在 col 里面的情形，如果你不确定，那就加点判断： int dist; list ::iterator pos = find(col.begin(), col.end(), 5); if ( pos != col.end() ) dist = distance(col.begin(), pos); 我想这还是比手写循环来的好些吧。 -------------------------------------------------------------------------- max, min 这是有直接的算法支持的，当然复杂度是 O(n)，用于未排序容器，如果是排序容器...老兄，那还需要什么算法么？ max_element(col.begin(), col.end()); min_element(col.begin(), col.end()); 注意返回的是 iterator ，如果你关心的只是值，那么好： *max_element(col.begin(), col.end()); *min_element(col.begin(), col.end()); max_element 和 min_element 都默认用 less 来排序，它们也都接受一个 binary predicate ，如果你足够无聊，甚至可以把 max_element 当成 min_element 来用，或者反之： *max_element(col.begin(), col.end(), greater ()); // 返回最小值！ *min_element(col.begin(), col.end(), greater ()); // 返回最大值当然它们的本意不是这个，而是让你能在比较特殊的情况下使用它们，例如，你要比较的是每个元素的某个成员，或者成员函数的返回值。例如： #include #include #include #include #include using namespace boost; using namespace std; struct Person { Person(const string& _name, int _age) : name(_name), age(_age) {} int age; string name; }; int main() { list col; list ::iterator pos; col.push_back(Person("Tom", 10)); col.push_back(Person("Jerry", 12)); col.push_back(Person("Mickey", 9)); Person eldest = *max_element(col.begin(), col.end(), bind(&Person::age, _1) < bind(&Person::age, _2));//>=1.33 cout << eldest.name; } 输出是 Jerry ，这里用了 boost.bind ，原谅我不知道用 bind2nd, mem_fun 怎么写，我也不想知道... ------------------------------------------------------------------------- copy_if 没错，STL 里面压根没有 copy_if ，这就是为什么我们需要这个： template OutputIterator copy_if( InputIterator begin, InputIterator end, OutputIterator destBegin, Predicate p) { while (begin != end) { if (p(*begin))*destBegin++ = *begin; ++begin; } return destBegin; } 把它放在自己的工具箱里，是一个明智的选择。 ------------------------------------------------------------------------ 惯用手法：erase(iter++) 如果你要去除一个 list 中的某些元素，那可千万小心：（下面的代码是错的！！！） #include #include #include #include int main() { int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::list lst(arr, arr + 10); for ( std::list ::iterator iter = lst.begin(); iter != lst.end(); ++iter) if ( *iter % 2 == 0 ) lst.erase(iter); std::copy(lst.begin(), lst.end(), std::ostream_iterator (std::cout, " ")); } 当 iter 被 erase 掉的时候，它已经失效，而后面却还会做 ++iter ，其行为无可预期！如果你不想动用 remove_if ，那么唯一的选择就是： #include #include #include #include int main() { int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::list lst(arr, arr + 10); for ( std::list ::iterator iter = lst.begin(); iter != lst.end(); ) if ( *iter % 2 == 0 ) lst.erase(iter++); else ++iter; std::copy(lst.begin(), lst.end(), std::ostream_iterator (std::cout, " ")); } 但是上面的代码不能用于 vector, string 和 deque ，因为对于这些容器， erase 不光令 iter 失效，还令 iter 之后的所有 iterator 失效！ ------------------------------------------------------------------------- erase(remove...) 惯用手法上面的循环如此难写，如此不通用，如此不容易理解，还是用 STL 算法来的好，但是注意，光 remove_if 是没用的，必须使用 erase(remove...) 惯用手法： #include #include #include #include #include #include int main() { int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::list lst(arr, arr + 10); lst.erase(remove_if(lst.begin(), lst.end(), boost::bind(std::modulus (), _1, 2) == 0), lst.end() ); std::copy(lst.begin(), lst.end(), std::ostream_iterator (std::cout, " ")); } 当然，这里借助了 boost.bind ，让我们不用多写一个没用的 functor 。简单常识——关于stream 从文件中读入一行简单，这样就行了： ifstream ifs("input.txt"); char buf[1000]; ifs.getline(buf, sizeof buf); string input(buf); 当然，这样没有错，但是包含不必要的繁琐和拷贝，况且，如果一行超过1000个字符，就必须用一个循环和更麻烦的缓冲管理。下面这样岂不是更简单？ string input; input.reserve(1000); ifstream ifs("input.txt"); getline(ifs, input); 不仅简单，而且安全，因为全局函数 getline 会帮你处理缓冲区用完之类的麻烦，如果你不希望空间分配发生的太频繁，只需要多 reserve 一点空间。这就是“简单常识”的含义，很多东西已经在那里，只是我一直没去用。 --------------------------------------------------------------------------- 一次把整个文件读入一个 string 我希望你的答案不要是这样： string input; while( !ifs.eof() ) { string line; getline(ifs, line); input.append(line).append(1, '/n'); } 当然了，没有错，它能工作，但是下面的办法是不是更加符合 C++ 的精神呢？ string input( istreambuf_iterator (instream.rdbuf()), istreambuf_iterator () ); 同样，事先分配空间对于性能可能有潜在的好处： string input; input.reserve(10000); input.assign( istreambuf_iterator (ifs.rdbuf()), istreambuf_iterator () ); 很简单，不是么？但是这些却是我们经常忽略的事实。补充一下，这样干是有问题的： string input; input.assign( istream_iterator (ifs), istream_iterator () ); 因为它会忽略所有的分隔符，你会得到一个纯“字符”的字符串。最后，如果你只是想把一个文件的内容读到另一个流，那没有比这更快的了： fstream fs("temp.txt"); cout << fs.rdbuf(); 因此，如果你要手工 copy 文件，这是最好的（如果不用操作系统的 API）： ifstream ifs("in.txt"); ofstream ofs("out.txt"); ofs << in.rdbuf(); ------------------------------------------------------------------------- open 一个文件的那些选项 ios::in Open file for reading ios::out Open file for writing ios::ate Initial position: end of file ios::app Every output is appended at the end of file ios::trunc If the file already existed it is erased ios::binary Binary mode ------------------------------------------------------------------------- 还有 ios 的那些 flag flag effect if set ios_base::boolalpha input/output bool objects as alphabetic names (true, false). ios_base::dec input/output integer in decimal base format. ios_base::fixed output floating point values in fixed-point notation. ios_base::hex input/output integer in hexadecimal base format. ios_base::internal the output is filled at an internal point enlarging the output up to the field width. ios_base::left the output is filled at the end enlarging the output up to the field width. ios_base::oct input/output integer in octal base format. ios_base::right the output is filled at the beginning enlarging the output up to the field width. ios_base::scientific output floating-point values in scientific notation. ios_base::showbase output integer values preceded by the numeric base. ios_base::showpoint output floating-point values including always the decimal point. ios_base::showpos output non-negative numeric preceded by a plus sign (+). ios_base::skipws skip leading whitespaces on certain input operations. ios_base::unitbuf flush output after each inserting operation. ios_base::uppercase output uppercase letters replacing certain lowercase letters. There are also defined three other constants that can be used as masks: constant value ios_base::adjustfield left | right | internal ios_base::basefield dec | oct | hex ios_base::floatfield scientific | fixed -------------------------------------------------------------------------- 用我想要的分隔符来解析一个字符串，以及从流中读取数据这曾经是一个需要不少麻烦的话题，由于其常用而显得尤其麻烦，但是其实 getline 可以做得不错： getline(cin, s, ';'); while ( s != "quit" ) { cout << s << endl; getline(cin, s, ';'); } 简单吧？不过注意，由于这个时候 getline 只把 ; 作为分隔符，所以你需要用 ;quit; 来结束输入，否则 getline 会把前后的空格和回车都读入 s ，当然，这个问题可以在代码里面解决。同样，对于简单的字符串解析，我们是不大需要动用什么 Tokenizer 之类的东西了： #include #include #include using namespace std; int main() { string s("hello,world, this is a sentence; and a word, end."); stringstream ss(s); for ( ; ; ) { string token; getline(ss, token, ','); if ( ss.fail() ) break; cout << token << endl; } } 输出： hello world this is a sentence; and a word end. 很漂亮不是么？不过这么干的缺陷在于，只有一个字符可以作为分隔符。 -------------------------------------------------------------------------- 把原本输出到屏幕的东西输出到文件，不用到处去把 cout 改成 fs #include #include using namespace std; int main() { ofstream outf("out.txt"); streambuf *strm_buf=cout.rdbuf(); cout.rdbuf(outf.rdbuf()); cout<<"write something to file"< cout.rdbuf(strm_buf); //recover cout<<"display something on screen"< system("PAUSE"); return 0; } 输出到屏幕的是： display something on screen 输出到文件的是： write something to file 也就是说，只要改变 ostream 的 rdbuf ，就可以重定向了，但是这招对 fstream 和 stringstream 都没用。 -------------------------------------------------------------------------- 关于 istream_iterator 和 ostream_iterator 经典的 ostream_iterator 例子，就是用 copy 来输出： #include #include #include #include #include #include using namespace std; int main() { vector vect; for ( int i = 1; i <= 9; ++i ) vect.push_back(i); copy(vect.begin(), vect.end(), ostream_iterator (cout, " ") ); cout << endl; ostream_iterator os_iter(cout, " ~ "); *os_iter = 1.0; os_iter++; *os_iter = 2.0; *os_iter = 3.0; } 输出： 1 2 3 4 5 6 7 8 9 1 ~ 2 ~ 3 ~ 很明显，ostream_iterator 的作用就是允许对 stream 做 iterator 的操作，从而让算法可以施加于 stream 之上，这也是 STL 的精华。与前面的“读取文件”相结合，我们得到了显示一个文件最方便的办法： copy(istreambuf_iterator (ifs.rdbuf()), istreambuf_iterator (), ostreambuf_iterator (cout) ); 同样，如果你用下面的语句，得到的会是没有分隔符的输出： copy(istream_iterator (ifs), istream_iterator (), ostream_iterator (cout) ); 那多半不是你要的结果。如果你硬是想用 istream_iterator 而不是 istreambuf_iterator 呢？还是有办法： copy(istream_iterator (ifs >> noskipws), istream_iterator (), ostream_iterator (cout) ); 但是这样不是推荐方法，它的效率比第一种低不少。如果一个文件 temp.txt 的内容是下面这样，那么我的这个从文件中把数据读入 vector 的方法应该会让你印象深刻。 12345 234 567 89 10 程序： #include #include #include #include #include using namespace std; int main() { ifstream ifs("temp.txt"); vector vect; vect.assign(istream_iterator (ifs), istream_iterator () ); copy(vect.begin(), vect.end(), ostream_iterator (cout, " ")); } 输出： 12345 234 567 89 10 很酷不是么？判断文件结束、移动文件指针之类的苦工都有 istream_iterator 代劳了。 ----------------------------------------------------------------------- 其它算法配合 iterator 计算文件行数： int line_count = count(istreambuf_iterator (ifs.rdbuf()), istreambuf_iterator (), '/n'); 当然确切地说，这是在计算文件中回车符的数量，同理，你也可以计算文件中任何字符的数量，或者某个 token 的数量： int token_count = count(istream_iterator (ifs), istream_iterator (), "#include"); 注意上面计算的是 “#include” 作为一个 token 的数量，如果它和其他的字符连起来，是不算数的。 ------------------------------------------------------------------------ Manipulator Manipulator 是什么？简单的说，就是一个接受一个 stream 作为参数，并且返回一个 stream 的函数，比如上面的 unskipws ，它的定义是这样的： inline ios_base& noskipws(ios_base& __base) { __base.unsetf(ios_base::skipws); return __base; } 这里它用了更通用的 ios_base 。知道了这一点，你大概不会对自己写一个 manipulator 有什么恐惧感了，下面这个无聊的 manipulator 会忽略 stream 遇到第一个分号之前所有的输入（包括那个分号）： template inline std::basic_istream & ignoreToSemicolon (std::basic_istream & s) { s.ignore(std::numeric_limits ::max(), s.widen(';')); return s; } 不过注意，它不会忽略以后的分号，因为 ignore 只执行了一次。更通用一点，manipulator 也可以接受参数的，下面这个就是 ignoreToSemicolon 的通用版本，它接受一个参数， stream 会忽略遇到第一个该参数之前的所有输入，写起来稍微麻烦一点： struct IgnoreTo { char ignoreTo; IgnoreTo(char c) : ignoreTo(c) {} }; std::istream& operator >> (std::istream& s, const IgnoreTo& manip) { s.ignore(std::numeric_limits ::max(), s.widen(manip.ignoreTo)); return s; } 但是用法差不多： copy(istream_iterator (ifs >> noskipws >> IgnoreTo(';')), istream_iterator (), ostream_iterator (cout) ); 其效果跟 IgnoreToSemicolon 一样。

mindva

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
stl string and stream

toupper,tolower地球人都知道 C++ 的 string 没有 toupper ，好在这不是个大问题，因为我们有 STL 算法：string s("heLLo");transform(s.begin(), s.end(), s.begin(), ::toupper);cout << s << endl;transform(s.begin(), s.end(), s.begi
复制链接

扫一扫