常用C++ 代码

最新推荐文章于 2024-03-19 17:09:33 发布

zhu2ruby

最新推荐文章于 2024-03-19 17:09:33 发布

阅读量705

点赞数

文章标签： c++ string iterator stream output distance

本文链接：https://blog.csdn.net/zhu2ruby/article/details/3941941

版权

toupper,tolower
地球人都知道 C++ 的 string 没有 toupper ，好在这不是个大问题，因为我们有 STL 算法：

string s("heLLo");
transform(s.begin(), s.end(), s.begin(), ::toupper);
cout << s << endl;
transform(s.begin(), s.end(), s.begin(), ::tolower);
cout << s << endl;

当然，我知道很多人希望的是 s.to_upper() ，但是对于一个这么通用的 basic_string 来说，的确没办法把这些专有的方法放进来。如果你用 boost stringalgo ，那当然不在话下，你也就不需要读这篇文章了。

------------------------------------------------------------------------
trim
我们还知道 string 没有 trim ，不过自力更生也不困难，比 toupper 来的还要简单：

     string s("    hello    ");
     s.erase(0, s.find_first_not_of(" /n"));
     cout << s << endl;
     s.erase(s.find_last_not_of(' ') + 1);
     cout << s << endl;

注意由于 find_first_not_of 和 find_last_not_of 都可以接受字符串，这个时候它们寻找该字符串中所有字符的 absence ，所以你可以一次 trim 掉多种字符。

-----------------------------------------------------------------------
erase
string 本身的 erase 还是不错的，但是只能 erase 连续字符，如果要拿掉一个字符串里面所有的某个字符呢？用 STL 的 erase + remove_if 就可以了，注意光 remove_if 是不行的。

     string s("    hello, world. say bye    ");
     s.erase(remove_if(s.begin(),s.end(),
         bind2nd(equal_to<char>(), ' ')),
     s.end());

上面的这段会拿掉所有的空格，于是得到 hello,world.saybye。

-----------------------------------------------------------------------
replace
string 本身提供了 replace ，不过并不是面向字符串的，譬如我们最常用的把一个 substr 换成另一个 substr 的操作，就要做一点小组合：

     string s("hello, world");
     string sub("ello, ");
     s.replace(s.find(sub), sub.size(), "appy ");
     cout << s << endl;

输出为 happy world。注意原来的那个 substr 和替换的 substr 并不一定要一样长。

-----------------------------------------------------------------------
startwith, endwith
这两个可真常用，不过如果你仔细看看 string 的接口，就会发现其实没必要专门提供这两个方法，已经有的接口可以干得很好：

     string s("hello, world");
     string head("hello");
     string tail("ld");
     bool startwith = s.compare(0, head.size(), head) == 0;
     cout << boolalpha << startwith << endl;
     bool endwith = s.compare(s.size() - tail.size(), tail.size(), tail) == 0;
     cout << boolalpha << endwith << endl;

当然了，没有 s.startwith("hello") 这样方便。

------------------------------------------------------------------------
toint, todouble, tobool...
这也是老生常谈了，无论是 C 的方法还是 C++ 的方法都可以，各有特色：

     string s("123");
     int i = atoi(s.c_str());
     cout << i << endl;

     int ii;
    stringstream(s) >> ii;
     cout << ii << endl;

     string sd("12.3");
     double d = atof(sd.c_str());
     cout << d << endl;

     double dd;
    stringstream(sd) >> dd;
     cout << dd << endl;

     string sb("true");
     bool b;
    stringstream(sb) >> boolalpha >> b;
     cout << boolalpha << b << endl;

C 的方法很简洁，而且赋值与转换在一句里面完成，而 C++ 的方法很通用。

------------------------------------------------------------------------
split
这可是件麻烦事，我们最希望的是这样一个接口： s.split(vect, ',') 。用 STL 算法来做有一定难度，我们可以从简单的开始，如果分隔符是空格、tab 和回车之类，那么这样就够了：

     string s("hello world, bye.");
     vector<string> vect;
     vect.assign(
        istream_iterator<string>(stringstream(s)),
         istream_iterator<string>()
     );

不过要注意，如果 s 很大，那么会有效率上的隐忧，因为 stringstream 会 copy 一份 string 给自己用。

------------------------------------------------------------------------
concat
把一个装有 string 的容器里面所有的 string 连接起来，怎么做？希望你不要说是 hand code 循环，这样做不是更好？

     vector<string> vect;
     vect.push_back("hello");
     vect.push_back(", ");
     vect.push_back("world");

     cout << accumulate(vect.begin(), vect.end(), string(""));

不过在效率上比较有优化余地。

-------------------------------------------------------------------------

reverse
其实我比较怀疑有什么人需要真的去 reverse 一个 string ，不过做这件事情的确是很容易：

  std::reverse(s.begin(), s.end());

上面是原地反转的方法，如果需要反转到别的 string 里面，一样简单：

   s1.assign(s.rbegin(), s.rend());

效率也相当理想。

-------------------------------------------------------------------------

解析文件扩展名
字数多点的写法：

     std::string filename("hello.exe");

     std::string::size_type pos = filename.rfind('.');
     std::string ext = filename.substr(pos == std::string::npos ? filename.length() : pos + 1);

不过两行，合并成一行呢？也不是不可以：

     std::string ext = filename.substr(filename.rfind('.') == std::string::npos ? filename.length() : filename.rfind('.') + 1);

我知道，rfind 执行了两次。不过第一，你可以希望编译器把它优化掉，其次，扩展名一般都很短，即便多执行一次，区别应该是相当微小。

STL 算法

distance
很多时候我们希望在一个 vector ，或者 list ，或者什么其他东西里面，找到一个值在哪个位置，这个时候 find 帮不上忙，而有人就转而求助手写循环了，而且是原始的手写循环：

for ( int i = 0; i < vect.size(); ++i)
     if ( vect[i] == value ) break;

如果编译器把 i 看作 for scope 的一部分，你还要把 i 的声明拿出去。真的需要这样么？看看这个：

     int dist =
         distance(col.begin(),
             find(col.begin(), col.end(), 5));

其中 col 可以是很多容器，list, vector, deque... 当然这是你确定 5 就在 col 里面的情形，如果你不确定，那就加点判断：

     int dist;
     list<int>::iterator pos = find(col.begin(), col.end(), 5);
     if ( pos != col.end() )
         dist = distance(col.begin(), pos);

我想这还是比手写循环来的好些吧。

--------------------------------------------------------------------------
max, min
这是有直接的算法支持的，当然复杂度是 O(n)，用于未排序容器，如果是排序容器...老兄，那还需要什么算法么？

max_element(col.begin(), col.end());
min_element(col.begin(), col.end());

注意返回的是 iterator ，如果你关心的只是值，那么好：

*max_element(col.begin(), col.end());
*min_element(col.begin(), col.end());

max_element 和 min_element 都默认用 less 来排序，它们也都接受一个 binary predicate ，如果你足够无聊，甚至可以把 max_element 当成 min_element 来用，或者反之：

*max_element(col.begin(), col.end(), greater<int>()); // 返回最小值！
*min_element(col.begin(), col.end(), greater<int>()); // 返回最大值

当然它们的本意不是这个，而是让你能在比较特殊的情况下使用它们，例如，你要比较的是每个元素的某个成员，或者成员函数的返回值。例如：

#include <iostream>
#include <list>
#include <algorithm>
#include <string>
#include <boost/bind.hpp>

using namespace boost;
using namespace std;

struct Person
{
     Person(const string& _name, int _age)
         : name(_name), age(_age)
     {}
     int age;
     string name;
};

int main()
{
     list<Person> col;
     list<Person>::iterator pos;

     col.push_back(Person("Tom", 10));
     col.push_back(Person("Jerry", 12));
     col.push_back(Person("Mickey", 9));

     Person eldest =
         *max_element(col.begin(), col.end(),
             bind(&Person::age, _1) < bind(&Person::age, _2));//>=1.33

     cout << eldest.name;
}

输出是 Jerry ，这里用了 boost.bind ，原谅我不知道用 bind2nd, mem_fun 怎么写，我也不想知道...

-------------------------------------------------------------------------
copy_if
没错，STL 里面压根没有 copy_if ，这就是为什么我们需要这个：

template<typename InputIterator, typename OutputIterator, typename Predicate>
OutputIterator copy_if(
     InputIterator begin, InputIterator end, OutputIterator destBegin, Predicate p)
{
     while (begin != end)
     {
         if (p(*begin))*destBegin++ = *begin;
         ++begin;
     }
     return destBegin;
}

把它放在自己的工具箱里，是一个明智的选择。

------------------------------------------------------------------------
惯用手法：erase(iter++)
如果你要去除一个 list 中的某些元素，那可千万小心：（下面的代码是错的！！！）

#include <iostream>
#include <algorithm>
#include <iterator>
#include <list>

int main()
{
     int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
     std::list<int> lst(arr, arr + 10);

     for ( std::list<int>::iterator iter = lst.begin();
           iter != lst.end(); ++iter)
         if ( *iter % 2 == 0 )
            lst.erase(iter);

     std::copy(lst.begin(), lst.end(),
         std::ostream_iterator<int>(std::cout, " "));
}

当 iter 被 erase 掉的时候，它已经失效，而后面却还会做 ++iter ，其行为无可预期！如果你不想动用 remove_if ，那么唯一的选择就是：

#include <iostream>
#include <algorithm>
#include <iterator>
#include <list>

int main()
{
     int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
     std::list<int> lst(arr, arr + 10);

     for ( std::list<int>::iterator iter = lst.begin();
           iter != lst.end(); )
         if ( *iter % 2 == 0 )
            lst.erase(iter++);
         else
             ++iter;

     std::copy(lst.begin(), lst.end(),
         std::ostream_iterator<int>(std::cout, " "));
}

但是上面的代码不能用于 vector, string 和 deque ，因为对于这些容器， erase 不光令 iter 失效，还令 iter 之后的所有 iterator 失效！

-------------------------------------------------------------------------
erase(remove...) 惯用手法
上面的循环如此难写，如此不通用，如此不容易理解，还是用 STL 算法来的好，但是注意，光 remove_if 是没用的，必须使用 erase(remove...) 惯用手法：

#include <iostream>
#include <algorithm>
#include <iterator>
#include <list>
#include <functional>
#include <boost/bind.hpp>

int main()
{
     int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
     std::list<int> lst(arr, arr + 10);

     lst.erase(remove_if(lst.begin(), lst.end(),
         boost::bind(std::modulus<int>(), _1, 2) == 0),
         lst.end()
     );

     std::copy(lst.begin(), lst.end(),
         std::ostream_iterator<int>(std::cout, " "));
}

当然，这里借助了 boost.bind ，让我们不用多写一个没用的 functor 。

简单常识——关于stream
从文件中读入一行

简单，这样就行了：

ifstream ifs("input.txt"); -- ifstream ifs=new ifstream("input.txt");
char buf[1000];

ifs.getline(buf, sizeof buf);

string input(buf);

当然，这样没有错，但是包含不必要的繁琐和拷贝，况且，如果一行超过1000个字符，就必须用一个循环和更麻烦的缓冲管理。下面这样岂不是更简单？

string input;
input.reserve(1000);
ifstream ifs("input.txt");
getline(ifs, input);

不仅简单，而且安全，因为全局函数 getline 会帮你处理缓冲区用完之类的麻烦，如果你不希望空间分配发生的太频繁，只需要多 reserve 一点空间。

这就是“简单常识”的含义，很多东西已经在那里，只是我一直没去用。

---------------------------------------------------------------------------

一次把整个文件读入一个 string

我希望你的答案不要是这样：

string input;
while( !ifs.eof() )
{
     string line;
     getline(ifs, line);
     input.append(line).append(1, '/n');
}

当然了，没有错，它能工作，但是下面的办法是不是更加符合 C++ 的精神呢？

string input(
istreambuf_iterator<char>(instream.rdbuf()),
istreambuf_iterator<char>()
);

同样，事先分配空间对于性能可能有潜在的好处：

string input;
input.reserve(10000);
input.assign(
istreambuf_iterator<char>(ifs.rdbuf()),
istreambuf_iterator<char>()
);

很简单，不是么？但是这些却是我们经常忽略的事实。
补充一下，这样干是有问题的：

     string input;
     input.assign(
         istream_iterator<char>(ifs),
         istream_iterator<char>()
     );

因为它会忽略所有的分隔符，你会得到一个纯“字符”的字符串。最后，如果你只是想把一个文件的内容读到另一个流，那没有比这更快的了：

fstream fs("temp.txt");
cout << fs.rdbuf();

因此，如果你要手工 copy 文件，这是最好的（如果不用操作系统的 API）：

    ifstream ifs("in.txt");
    ofstream ofs("out.txt");
    ofs << in.rdbuf();

-------------------------------------------------------------------------

open 一个文件的那些选项

ios::in     Open file for reading
ios::out    Open file for writing
ios::ate    Initial position: end of file
ios::app    Every output is appended at the end of file
ios::trunc  If the file already existed it is erased
ios::binary Binary mode

-------------------------------------------------------------------------

还有 ios 的那些 flag

flag effect if set
ios_base::boolalpha input/output bool objects as alphabetic names (true, false).
ios_base::dec input/output integer in decimal base format.
ios_base::fixed output floating point values in fixed-point notation.
ios_base::hex input/output integer in hexadecimal base format.
ios_base::internal the output is filled at an internal point enlarging the output up to the field width.
ios_base::left the output is filled at the end enlarging the output up to the field width.
ios_base::oct input/output integer in octal base format.
ios_base::right the output is filled at the beginning enlarging the output up to the field width.
ios_base::scientific output floating-point values in scientific notation.
ios_base::showbase output integer values preceded by the numeric base.
ios_base::showpoint output floating-point values including always the decimal point.
ios_base::showpos output non-negative numeric preceded by a plus sign (+).
ios_base::skipws skip leading whitespaces on certain input operations.
ios_base::unitbuf flush output after each inserting operation.
ios_base::uppercase output uppercase letters replacing certain lowercase letters.

There are also defined three other constants that can be used as masks:

constant value
ios_base::adjustfield left | right | internal
ios_base::basefield dec | oct | hex
ios_base::floatfield scientific | fixed

--------------------------------------------------------------------------

用我想要的分隔符来解析一个字符串，以及从流中读取数据

这曾经是一个需要不少麻烦的话题，由于其常用而显得尤其麻烦，但是其实 getline 可以做得不错：

     getline(cin, s, ';');
     while ( s != "quit" )
     {
         cout << s << endl;
         getline(cin, s, ';');
     }

简单吧？不过注意，由于这个时候 getline 只把 ; 作为分隔符，所以你需要用 ;quit; 来结束输入，否则 getline 会把前后的空格和回车都读入 s ，当然，这个问题可以在代码里面解决。

同样，对于简单的字符串解析，我们是不大需要动用什么 Tokenizer 之类的东西了：

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main()
{
     string s("hello,world, this is a sentence; and a word, end.");
     stringstream ss(s);

     for ( ; ; )
     {
         string token;
         getline(ss, token, ',');
         if ( ss.fail() ) break;

         cout << token << endl;
     }
}

输出：

hello
world
this is a sentence; and a word
end.

很漂亮不是么？不过这么干的缺陷在于，只有一个字符可以作为分隔符。

--------------------------------------------------------------------------

把原本输出到屏幕的东西输出到文件，不用到处去把 cout 改成 fs

#include <iostream>
#include <fstream>

using namespace std;

int main()
{
     ofstream outf("out.txt");
     streambuf *strm_buf=cout.rdbuf();
     cout.rdbuf(outf.rdbuf());
     cout<<"write something to file"<<endl;
     cout.rdbuf(strm_buf);    //recover
     cout<<"display something on screen"<<endl;
     system("PAUSE");
     return 0;
}

输出到屏幕的是：

display something on screen

输出到文件的是：

write something to file

也就是说，只要改变 ostream 的 rdbuf ，就可以重定向了，但是这招对 fstream 和 stringstream 都没用。

--------------------------------------------------------------------------

关于 istream_iterator 和 ostream_iterator

经典的 ostream_iterator 例子，就是用 copy 来输出：

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>
#include <vector>
#include <iterator>

using namespace std;

int main()
{
     vector<int> vect;
     for ( int i = 1; i <= 9; ++i )
         vect.push_back(i);

     copy(vect.begin(), vect.end(),
         ostream_iterator<int>(cout, " ")
     );
     cout << endl;

     ostream_iterator<double> os_iter(cout, " ~ ");
     *os_iter = 1.0;
     os_iter++;
     *os_iter = 2.0;
     *os_iter = 3.0;
}

输出：

1 2 3 4 5 6 7 8 9
1 ~ 2 ~ 3 ~

很明显，ostream_iterator 的作用就是允许对 stream 做 iterator 的操作，从而让算法可以施加于 stream 之上，这也是 STL 的精华。与前面的“读取文件”相结合，我们得到了显示一个文件最方便的办法：

     copy(istreambuf_iterator<char>(ifs.rdbuf()),
          istreambuf_iterator<char>(),
          ostreambuf_iterator<char>(cout)
     );

同样，如果你用下面的语句，得到的会是没有分隔符的输出：

     copy(istream_iterator<char>(ifs),
          istream_iterator<char>(),
          ostream_iterator<char>(cout)
     );

那多半不是你要的结果。如果你硬是想用 istream_iterator 而不是 istreambuf_iterator 呢？还是有办法：

     copy(istream_iterator<char>(ifs >> noskipws),
          istream_iterator<char>(),
          ostream_iterator<char>(cout)
     );

但是这样不是推荐方法，它的效率比第一种低不少。
如果一个文件 temp.txt 的内容是下面这样，那么我的这个从文件中把数据读入 vector 的方法应该会让你印象深刻。

12345 234 567
89 10

程序：

#include <iostream>
#include <fstream>
#include <algorithm>
#include <vector>
#include <iterator>

using namespace std;

int main()
{
     ifstream ifs("temp.txt");

     vector<int> vect;
     vect.assign(istream_iterator<int>(ifs),
         istream_iterator<int>()
     );

     copy(vect.begin(), vect.end(), ostream_iterator<int>(cout, " "));
}

输出：

12345 234 567 89 10

很酷不是么？判断文件结束、移动文件指针之类的苦工都有 istream_iterator 代劳了。

-----------------------------------------------------------------------

其它算法配合 iterator

计算文件行数：

     int line_count =
         count(istreambuf_iterator<char>(ifs.rdbuf()),
               istreambuf_iterator<char>(),
               '/n');

当然确切地说，这是在计算文件中回车符的数量，同理，你也可以计算文件中任何字符的数量，或者某个 token 的数量：

     int token_count =
         count(istream_iterator<string>(ifs),
               istream_iterator<string>(),
               "#include");

注意上面计算的是 “#include” 作为一个 token 的数量，如果它和其他的字符连起来，是不算数的。

------------------------------------------------------------------------
Manipulator

Manipulator 是什么？简单的说，就是一个接受一个 stream 作为参数，并且返回一个 stream 的函数，比如上面的 unskipws ，它的定义是这样的：

   inline ios_base&
   noskipws(ios_base& __base)
   {
     __base.unsetf(ios_base::skipws);
     return __base;
   }

这里它用了更通用的 ios_base 。知道了这一点，你大概不会对自己写一个 manipulator 有什么恐惧感了，下面这个无聊的 manipulator 会忽略 stream 遇到第一个分号之前所有的输入（包括那个分号）：

template <class charT, class traits>
inline std::basic_istream<charT, traits>&
ignoreToSemicolon (std::basic_istream<charT, traits>& s)
{
s.ignore(std::numeric_limits<int>::max(), s.widen(';'));
return s;
}

不过注意，它不会忽略以后的分号，因为 ignore 只执行了一次。更通用一点，manipulator 也可以接受参数的，下面这个就是 ignoreToSemicolon 的通用版本，它接受一个参数， stream 会忽略遇到第一个该参数之前的所有输入，写起来稍微麻烦一点：

struct IgnoreTo {
     char ignoreTo;
     IgnoreTo(char c) : ignoreTo(c)
     {}
};

std::istream& operator >> (std::istream& s, const IgnoreTo& manip)
{
     s.ignore(std::numeric_limits<int>::max(), s.widen(manip.ignoreTo));
     return s;
}

但是用法差不多：

     copy(istream_iterator<char>(ifs >> noskipws >> IgnoreTo(';')),
          istream_iterator<char>(),
          ostream_iterator<char>(cout)
     );

其效果跟 IgnoreToSemicolon 一样。

在介绍StdExt的时候，我曾经提到，STL设计精良，但是以下几块仍然设计不足（或缺失）：

关于内存管理，我们已经说得很多了。这里我们重点谈的是字符串处理/文本处理相关的问题。本篇是《字符串处理完整参考》这个系列的第一篇。

历史

字符串处理/文本处理是一个历史悠久，并且相当复杂的一个话题。从简单到字符串的比较（compare）、连接（concat），到复杂的文本编辑、正则表达式、HTML文本内容的解析，都属于相关的范畴。

在C语言时代，C库提供了基于char*数据类型的字符串处理函数，典型代表如strlen，strcpy，strcat等。原始、容易出错，是这类字符串处理方法的典型特征。另外，strcat的效率并不高（Borland引入了strecpy来解决这个问题。其实这个strecpy的泛化版本，就是后来STL中的std::copy），而字符串查找（strstr）也是用了最原始的方式。

STL的string（basic_string）的出现，一定程度上改善了这种情况。至少C++程序员有一个使用界面“友善”的string（字符串）类了。然而，string类可以说是STL中最受争议的类（下文我们详细解释）。这些争议至少证明，STL的string类存在设计缺陷。

在SGI STL中，引入了rope类。这是一个重量级的字符串类。rope英文本意是绳子。string英文本意是线。所以rope是重量级的string，这个名字取得很形象，非常到位。

在StdExt库开始考虑字符串处理支持的时候，我引入了以下四个类：std::String / std::StringBuilder / std::TextPool / std::Rope。其中，std::String/std::StringBuilder其实是STL string类的功能分拆。std::String是一个常字符串，而std::StringBuilder负责字符串的修改操作。大家很清楚，String/StringBuilder的概念从Java中引入，我一直认为Java的字符串处理类的设计比C++这样把两者揉在一起的string实现要合理很多。std::TextPool / std::Rope则是字符串类的重量级实现，用来处理巨型的字符串。

STL的string（basic_string）的缺陷

归纳起来，STL的string类主要有以下这些争议点：

接口过多且规格和其他STL容器没有达成很好的一致性。例如，string::find使用下标，而不是以iterator作为迭代位置，这和其他容器不太一样。
内存碎片。由于过于频繁的字符串构造、析构，导致系统的内存碎片现象严重。
Copy-On-Write与多线程安全。string(basic_string)基于Copy-On-Write技术的原因，是因为 string的赋值被设计成为低开销的。但是一旦考虑到多线程安全问题，Copy-On-Write会把大量的时间花在锁的开销上。一些新的STL实现（如SGI STL）放弃了基于Copy-On-Write的string实现。

盘点StdExt的字符串类：String/StringBuilder/TextPool/Rope

为什么我们需要这么多的字符串类？一个原因：字符串处理的应用环境很复杂，需要因地制宜，指望一个string类行遍天下是不可能的。

从支持的串的规模来讲，String/StringBuilder重点解决小字符串的问题（特别是StringBuilder，在大字符串情形下，一定会有性能瓶颈）。而TextPool, Rope重点解决巨型字符串的问题。

从实现上来讲，String/StringBuilder是线性内存的。而TextPool, Rope的字符串并不物理连续，它们是逻辑字符串。

从支持的操作来讲，String是常字符串；StringBuilder/TextPool主要支持改写(set)、添加(append)操作，但不推荐插入(insert)操作，从伸缩性来讲，TextPool好要好于StringBuilder；而Rope的操作侧重点在于优化字符串级的复杂操作，如取子字符串、插入、删除等，但是单个字符的修改和获取代价略高（相比于String/StringBuilder/TextPool）。

VisualFC最初始作为WinxGui的可视化开发环境,于2007年5月开始开发，10月份更名为VisualFC，正式支持WTL和WinxGUI。VisualFC本身使用WTL开发完成，利用了CODE::BLOCKS和WTLHelper的部分组件，以vfc_core.dll提供核心支持，目前提供了VC60/EVC4.0和VS2005开发环境下的插件。