C++ vector STL实现详解

最新推荐文章于 2024-04-19 23:54:56 发布

cyber19

最新推荐文章于 2024-04-19 23:54:56 发布

阅读量2.9k

点赞数

分类专栏： c++ 文章标签： stl c++ gdb vector

本文链接：https://blog.csdn.net/matrix_zzl/article/details/78577292

版权

c++ 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

前言
stl库位置
vector代码分析
vector变量gdb输出总结

0.前言

早前使用gdb调试特别不习惯: 1）没有图形界面IDE（比如Visual Studio）的强大功能：边打断点边代码跟进，退出断点保存，可以随时查看当前变量数据，对stl变量显示友好。2）gdb打印输出的内容有时难以理解，比如gdb只会打印出stl相关容器、复杂的类对象，智能指针的成员数据，而不会做格式化内容输出。比如原生gdb打印输出vector变量：

p v  # v为vector变量，初始化为vector<int> v(10, 1)，直接print打印输出内容如下
$1 = {
  <_Vector_base<int, std::allocator<int> >> = {
    _M_impl = {
      <allocator<int>> = {
        <new_allocator<int>> = {<No data fields>}, <No data fields>},
      members of _Vector_base<int, std::allocator<int> >::_Vector_impl:
      _M_start = 0x100501ef0,
      _M_finish = 0x100501f18,
      _M_end_of_storage = 0x100501f18
    }
  }, <No data fields>}

可以看到，vector成员变量仅有_M_impl，而_M_impl成员变量又包含_M_start， _M_finish， _M_end_of_storage。不难推测：vector数据位于一块堆内存中，_M_start指向其分配的内存，_M_finish表示有效数据的结束位置，由于vector是动态数据，真正内存分配大小会大于实际使用，因为得考虑后期数据的增长，会预留内存，_M_end_of_storage指向的就是分配内存的结束位置。要想看vector对应的数据内容，需要复杂的gdb语句：

print *(v._M_impl._M_start)@v.size()

发现用好gdb还是需要稍微了解下各种stl容器内存布局，而且stl容器也没有想象中的复杂，难懂主要是stl库用了各种高级模板封装和宏定义，本文以vector的标准库stl实现作为说明，vector还是个比较综合的例子，stl的其他容器比如string、map过程类似。想要配置对应的gdb请移动到《gdb调试配置》

1.stl库位置

stl标准库不同版本差异很大，实现版本有HP STL、PJ STL、 SGI STL（侯捷的《STL源码剖析》分析的版本）等，而且stl库大致分为容器(containers)、迭代器(iterators)、空间配置器(allocator)、配接器(adapters)、算法(algorithms)、仿函数(functors)六个部分。安装了c++编译环境后，会安装上相应的stl库文件。mac系统下，代码#include <vector>语句包含的vector库下会是以下几种中的一种：

/usr/include/c++/4.2.1/vector # mac/linux # 系统stl库位置
/usr/local/Cellar/gcc/6.4.0/include/c++/6.4.0/vector # 安装GNU的gcc自带的stl库
/Library/Developer/CommandLineTools/usr/include/c++/v1/vector # mac系统下xcode编译clang自带stl库

其中第3种，是clang编译器自带的stl，文件即为定义文件，如果使用clang编译会包含改版本的vector实现，成员变量为__begin_、__end_、__end_cap_，与前面是有差别的，同样gdb打印如下：

$1 = {
  <std::__1::__vector_base<int, std::__1::allocato<int> >> = {
    <std::__1::__vector_base_common<true>> = {<No data fields>},
    members of std::__1::__vector_base<int, std::__1::allocator<int> >:
    __begin_ = 0x100300d00,
    __end_ = 0x100300d28,
    __end_cap_ = {
      <std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>> = {
        <std::__1::allocator<int>> = {<No data fields>},
        members of std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>:
        __first_ = 0x100300d28
      }, <No data fields>}
  }, <No data fields>}

1和2仅为头文件，包含相应的真正实现文件，vector定义实现文件在/usr/include/c++/4.2.1/bits/stl_vector.h或者/usr/local/Cellar/gcc/6.4.0/include/c++/6.4.0/bits/stl_vector.h，两者实现方式差不多，本文以前者作为说明。

#include <bits/stl_algobase.h>
#include <bits/allocator.h>
#include <bits/stl_construct.h>
#include <bits/stl_uninitialized.h>
#include <bits/stl_vector.h>
#include <bits/stl_bvector.h> 
#include <bits/range_access.h>

2. vector代码分析

bits/stl_vector.h源代码主要有2部分（下面代码做了格式修改，去除掉些无关内容）：

1）模板类_Vector_base

template<typename _Tp, typename _Alloc>
struct _Vector_base {
  typedef typename _Alloc::template rebind<_Tp>::other _Tp_alloc_type;
  struct _Vector_impl : public _Tp_alloc_type {
    _Tp*           _M_start;
    _Tp*           _M_finish;
    _Tp*           _M_end_of_storage;
    _Vector_impl(_Tp_alloc_type const& __a): _Tp_alloc_type(__a), _M_start(0), _M_finish(0), _M_end_of_storage(0){ } //构造函数
  };

public:
  typedef _Alloc allocator_type;
  _Tp_alloc_type& 
  _M_get_Tp_allocator() { return *static_cast<_Tp_alloc_type*>(&this->_M_impl); }

  const _Tp_alloc_type& 
  _M_get_Tp_allocator() const { return *static_cast<const _Tp_alloc_type*>(&this->_M_impl); }

  allocator_type get_allocator() const { return allocator_type(_M_get_Tp_allocator()); }
  _Vector_base(const allocator_type& __a): _M_impl(__a){ }
  _Vector_base(size_t __n, const allocator_type& __a): _M_impl(__a) { // _M_impl申请n个对象的内存，设置相应指针
      if (__n) {
        this->_M_impl._M_start = this->_M_allocate(__n);
        this->_M_impl._M_finish = this->_M_impl._M_start;
        this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
      }
  }
  ~_Vector_base() { _M_deallocate(this->_M_impl._M_start, this->_M_impl._M_end_of_storage
          - this->_M_impl._M_start); } // 释放_M_impl申请的所有内存

public:
  _Vector_impl _M_impl;

  _Tp* _M_allocate(size_t __n) { return _M_impl.allocate(__n); }
  void _M_deallocate(_Tp* __p, size_t __n){
    if (__p)
      _M_impl.deallocate(__p, __n);
  }
};

模板类_Vector_base，模板类参数是：包含的数据类型与内存分配器类型。只定义一个成员变量：_M_impl，该变量继承于类型分配器_Tp_alloc_type。_M_impl主要功能具备父类内存管理操作函数allocate、deallocate，负责申请释放内存，_Vector_base其实是对_M_impl包装了一层。其中有句声明较为难理解：

typedef typename _Alloc::template rebind<_Tp>::other _Tp_alloc_type;

此时，需要知道内存分配器类allocator.h中class allocator定义模板类rebind，该类内部又声明类型typedef allocator<_Tp1> other，因此_M_impl的父类是allocator<_Tp> other

template<typename _Tp1> struct rebind { typedef allocator<_Tp1> other; };

typedef typename同时出现又比较奇怪，早前C++开发中几乎没碰见过，typename使用类嵌套定义类类型时，一定需要使用typename，为了方便g++编译解析。出现这种情况呢是有历史渊源的，推荐阅读http://feihu.me/blog/2014/the-origin-and-usage-of-typename/：

template <class T>
void foo() {
    T::iterator * iter;
    // ...
}

struct ContainsAType {
    struct iterator { /*...*/ };
    // ...
};

struct ContainsAnotherType {
    static int iterator;
    // ...
};

上述定义的类中，实例化auto f = foo<ContainsAType>();没有问题，而实例化auto f = foo<ContainsAnotherType>();是会有问题的，因为编译器会发现foo<ContainsAnotherType>并不是类型而是个变量。

C++标准出现typename是会了让编译器指定寻找iterator为类型，而不是变量，即使用typename T::iterator * iter，消除歧义。同时对于模板参数建议使用iterator，typename比class表达的更为清楚，而且class能存在主要是因为在新标准出来前，早前代码中都是使用的是class，为了兼容而保留。

2）vector模板实现类

template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class vector : protected _Vector_base<_Tp, _Alloc> {
  // 定义各种类别名，否则代码容易过长
  typedef typename _Alloc::value_type     _Alloc_value_type;      
  typedef _Vector_base<_Tp, _Alloc>       _Base;
  typedef vector<_Tp, _Alloc>             vector_type;
  typedef typename _Base::_Tp_alloc_type  _Tp_alloc_type;
  typedef _Tp                             value_type;

  // 没有成员变量，只有成员函数
public:
  // 构造函数
  explicit vector(const allocator_type& __a = allocator_type()) : _Base(__a) { } // 默认构造函数，指针所有都为0

  vector(size_type __n, const value_type& __value = value_type(), const allocator_type& __a = allocator_type()): _Base(__n, __a) {  
    // 使用vector<Tp> v(10, Tp变量)构造：
    // 1）_Base内存分配器申请__n个容纳__value对象内存，
    // 2）__uninitialized_fill_n_a函数会连续用__value初始化__n次（不会调用类型的构造函数，相当于内存复制n次把），设置_M_finish指针为结束位置,_Base申请内存已经设置过_M_end_of_storage指针。
    std::__uninitialized_fill_n_a(this->_M_impl._M_start, __n, __value, _M_get_Tp_allocator());
    this->_M_impl._M_finish = this->_M_impl._M_start + __n;
  }

  vector(const vector& __x): _Base(__x.size(), __x._M_get_Tp_allocator()) {  
    // 使用vector<Tp> v(other)构造： 
    // 1) _Base内存分配器申请__x.size()容纳__value对象内存 
    // 2)使用__uninitialized_copy_a直接拷贝源内存到目标内存，因此这种构造函数是浅拷贝方式。
    this->_M_impl._M_finish = std::__uninitialized_copy_a(__x.begin(), __x.end(), this->_M_impl._M_start, _M_get_Tp_allocator());
  }

  ~vector() {  
    // 1) vector析构函数调用std::_Destroy对内存中每个_Tp对象调用对应类型的析构函数
    // 2) 再用内存分配器释放掉申请内存
    std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish, _M_get_Tp_allocator()); 
 }

  // 简单常用的函数，（由于这些函数的实现都是在定义文件里，均为内联函数，至于最终是否被编译优化展开，得看编译器优化选项与函数复杂程度），一些复杂的函数定义会放在bits/vector.tcc文件中
  iterator begin() { return iterator(this->_M_impl._M_start); }
  iterator end() { return iterator(this->_M_impl._M_finish); }
  size_type size() const { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }
  size_type capacity() const { return size_type(this->_M_impl._M_end_of_storage - this->_M_impl._M_start); }
  bool empty() const { return begin() == end(); }
  reference operator[](size_type __n) { return *(this->_M_impl._M_start + __n); }


  // 1) 如果insert是最后一个位置而且_M_finish不等于_M_end_of_storage，直接在最后位置构造__x，_M_finish++
  // 2) 否则， 调用_M_insert_aux函数，不一定会重新申请更大内存
  insert(iterator __position, const value_type& __x) {
    const size_type __n = __position - begin();
    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage && __position == end()) {
      this->_M_impl.construct(this->_M_impl._M_finish, __x);
      ++this->_M_impl._M_finish;
    } else
      _M_insert_aux(__position, __x);
    return iterator(this->_M_impl._M_start + __n);
  }

  // 1）如果没到达_M_end_of_storage位置，直接_M_finish调用construct构造__x变量，_M_finish++
  // 2）如果到达_M_end_of_storage，当前内存无法满足会发生重新申请更大内存，即调用_M_insert_aux函数。
  push_back(const value_type& __x) {
    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
      this->_M_impl.construct(this->_M_impl._M_finish, __x);
      ++this->_M_impl._M_finish;
    } else
      _M_insert_aux(end(), __x);
  }

  // 1）resize函数对于__new_size < 当前size，删除下表区间[__new_size, size())元素，删除元素会调用对应的析构函数，而且_M_end_of_storage指针不移动。
  // 2） __new_size >= 当前size, 在end()指针后面插入__new_size - size()个对象，并用__x变量初始化，如果插入增长大于_M_end_of_storage，会发生重新更大申请进行拷贝。
 resize(size_type __new_size, value_type __x = value_type()) {
    if (__new_size < size())
      _M_erase_at_end(this->_M_impl._M_start + __new_size);
    else
      insert(end(), __new_size - size(), __x);
  }

  // 1）__n 大于最大申请内存max_size为size_t(-1)/sizeof(value_type)，直接抛异常
  // 2) 对于__n > capacity, 需要直接分配内存__n * sizeof(value_type), 拷贝之前的内容到新内存中，释放旧内存，设置指针
  // 3) __n <= capacity, 不作任何事情
  reserve(size_type __n) {
    if (__n > this->max_size()) __throw_length_error(__N("vector::reserve"));
    if (this->capacity() < __n) {
      const size_type __old_size = size();
      pointer __tmp = _M_allocate_and_copy(__n, this->_M_impl._M_start, this->_M_impl._M_finish);
      std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
      _M_get_Tp_allocator());
      _M_deallocate(this->_M_impl._M_start,
      this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
      this->_M_impl._M_start = __tmp;
      this->_M_impl._M_finish = __tmp + __old_size;
      this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
    }
  }
  ...
};

vector继承于模板类_Vector_base，使用默认std::allocator内存分配器，用户可以定制选择其他的内存分配器，实现各种成员函数。下面对于代码较长的成员函数，特殊分析（注意：不同版本可能实现会有差异）：

_M_insert_aux(__position, __x)函数过程，会被insert与push_back函数调用：

继续判断_M_finish是否等于_M_end_of_storage，不等于表示还有剩余内存，将[__position, _M_finish) 搬到[__position+1, _M_finish+1), 采用后向复制方式，这种方式不需要构造临时对象作为暂存，并赋值__position = __x， _M_finish++。
等于_M_end_of_storage，需要内存扩增，如果原本大小为0，新增内存大小为1*sizeof(value_type)，否则内存double下原有内存，考虑内存溢出情况，最大内存为：(size_t(-1) / sizeof(value_type) )*sizeof(value_type)，如果分配了最大内存还是不够，就会抛出异常。

3. vector变量gdb输出总结

对前言的例子作下注释说明

$1 = {
  ###### _Vector_base父类 ######
  <_Vector_base<int, std::allocator<int> >> = {
    # _Vector_base 唯一成员_M_impl
    _M_impl = {
      ###### _M_impl对应类的父类为allocator<int> ###### 
      <allocator<int>> = {
        <new_allocator<int>> = {<No data fields>}, <No data fields>},
      ###### _M_impl三个成员指针  #######
      members of _Vector_base<int, std::allocator<int> >::_Vector_impl:
      _M_start = 0x100501ef0,
      _M_finish = 0x100501f18,
      _M_end_of_storage = 0x100501f18
    }
  ###### vector自身No data fields，即无成员变量 ###### 
  }, <No data fields>}

对于字符串std::string，也可以按照上述分析，打印输出gdb内容可以使用：p s._M_dataplus._M_p
对于智能指针shared_ptr，打印输出gdb内容可以使用：p *(s_p._M_ptr)
不难分析这种情况下： sizeof(vector) = 24，系统64位，需要3个指针，因此为8*3 = 24

cyber19

关注

0
点赞
踩
12

收藏

觉得还不错? 一键收藏
1
评论
C++ vector STL实现详解

前言stl库位置vector代码分析vector变量gdb输出总结0.前言早前使用gdb调试特别不习惯，1）没有具备图形界面IDE（比如Visual Studio）的强大功能：边打断点边代码跟进，退出断点保存，可以随时查看当前变量数据，对stl变量显示友好。2）gdb打印输出的内容有时难以理解，比如gdb只会打印出stl相关容器、复杂的类对象，智能指针的成员数据，而不会做格式化内容输出。比
复制链接

扫一扫