关于写时复制（copy-on-write）

最新推荐文章于 2023-11-27 20:55:28 发布

JKay_Wong

最新推荐文章于 2023-11-27 20:55:28 发布

阅读量952

点赞数

分类专栏： C/C++ 文章标签： string buffer reference iterator leak c

C/C++ 专栏收录该内容

27 篇文章 0 订阅

订阅专栏

大部份的STL在实现string时，都采用COW来保证其高效性。即多个类会共用一个数据缓冲区(buffer)，在拷贝构造、赋值等操作时，并不会对buffer进行复制。仅在需要对buffer进行修改，而且此buffer已与别的类共享了，才会开辟空间，将buffer复制一份进行修改。同样在析构时，如果buffer与与别的类共享，也不会释放空间。

例如：

#include <stdio.h>
#include <string>
using namespace std;

int main()
{
    string test1 = "hello";
    string test2(test1);

    printf("test1:%p test2:%p\n", test1.c_str(), test2.c_str());
}

运行结果：
引用：test1:0x90a9014 test2:0x90a9014

c_str()返回指向当前串的一个临时指针，是一个只读指针

可见两个地址是相等的，它们共用了同一个缓冲区。
什么时候会引起数据区的复制？当然是要修改string的值的时候。

#include <stdio.h>
#include <string>
using namespace std;

int main()
{
    string test1 = "hello";
    string test2(test1);

    printf("test1:%p test2:%p\n", test1.c_str(), test2.c_str());
    test2[0] = 'w';
    printf("test1:%p test2:%p\n", test1.c_str(), test2.c_str());
}

运行结果：
引用：test1:0x9e85014 test2:0x9e85014
test1:0x9e85014 test2:0x9e8502c

可以看到test2发生了变化。
再进一步，编译如何确定程序要对buffer进行修改，从而去开辟新的空间呢？
程序一般是通过[]运算符、iterator去访问并修改数据。很自然地认为，对于左值会引起数据复制，而右值不会。但实际上，编译没这么做。可能是左值或右值的判定并没有那么简单吧？

#include <stdio.h>
#include <string>
using namespace std;

int main()
{
    string test1 = "hello";
    string test2(test1);

    printf("test1:%p   test2:%p\n", test1.c_str(), test2.c_str());
     printf("test1:%p   test2:%p\n", &test1[0], &test2[0]);
}

运行结果：
引用：test1:0x8a4a014 test2:0x8a4a014
test1:0x8a4a014 test2:0x8a4a02c

test2发生了变化。
看一下源码：

const_reference
      operator[] (size_type __pos) const
      {
        _GLIBCXX_DEBUG_ASSERT(__pos <= size());
        return _M_data()[__pos];
      }

      reference
      operator[](size_type __pos)
      {
        _GLIBCXX_DEBUG_ASSERT(__pos < size());
        _M_leak();
        return _M_data()[__pos];
      }

也就是说判定是否可能有写操作是与类的类型相关的，如果是const string，则不复制，如果是string，则一定复制
再看看这个：

#include <stdio.h>
#include <string>
using namespace std;

int main()
{
    string test1 = "hello";
    string test2(test1);

    printf("test1:%p test2:%p\n", test1.c_str(), test2.c_str());
    const string &test3 = test1;
    const string &test4 = test2;
    printf("test1:%p test2:%p\n", &test3[0], &test4[0]);
}

结果就是：
引用：test1:0x8c62014 test2:0x8c62014
test1:0x8c62014 test2:0x8c62014

当然这样写很难受，凭什么要搞两个const的引用出来啊？
这样就比较自然：

#include <stdio.h>
#include <string>
using namespace std;

void proc(const string& test1, const string& test2)
{
    printf("test1:%p test2:%p\n", &test1[0], &test2[0]);
}

int main()
{
    string test1 = "hello";
    string test2(test1);

    printf("test1:%p test2:%p\n", test1.c_str(), test2.c_str());
    proc(test1, test2);
}

也是说一定要严格地确定数据类型是否是const的，如果函数里不修改修，则传const，良好的习惯有利于代码质量的提高。
string和char *是无法共享数据区的，所以用c++就尽量少用指针，两种风格合在一起，效率是最低的。

转自 http://bbs.chinaunix.net/viewthread.php?tid=834292