STL源码阅读-allocator

最新推荐文章于 2022-11-17 10:05:36 发布

原创最新推荐文章于 2022-11-17 10:05:36 发布 · 949 阅读

3 ·

CC 4.0 BY-SA版权

STL源码阅读专栏收录该内容

6 篇文章

订阅专栏

本文详细解析了STL中Allocator的功能与实现，包括allocate、deallocate、address等关键函数，以及如何通过traits技术进行类型推断。同时介绍了SGI在空间配置与释放方面的设计哲学和实现细节，包括第一级与第二级配置器的工作原理。

参照侯捷的《STL源码解析》，写出自己在阅读源码过程中的一些疑问，顺便整理一下自己的思路。

allocator是STL运用过程中最不需要介绍的东西，但是它却维系着整个STL的运行，因为不论是vector、set、list等等，都需要存储空间作为载体，而allocator就是用来分配存储空间的。

首先看一下allocator的定义：

template <class _Tp>
class allocator {
  typedef alloc _Alloc;          // The underlying allocator.
public:
  typedef size_t     size_type;
  typedef ptrdiff_t  difference_type;
  typedef _Tp*       pointer;
  typedef const _Tp* const_pointer;
  typedef _Tp&       reference;
  typedef const _Tp& const_reference;
  typedef _Tp        value_type;

  template <class _Tp1> struct rebind {
    typedef allocator<_Tp1> other;
  };

  allocator() __STL_NOTHROW {}
  allocator(const allocator&) __STL_NOTHROW {}
  template <class _Tp1> allocator(const allocator<_Tp1>&) __STL_NOTHROW {}
  ~allocator() __STL_NOTHROW {}

  pointer address(reference __x) const { return &__x; }
  const_pointer address(const_reference __x) const { return &__x; }

  // __n is permitted to be 0.  The C++ standard says nothing about what
  // the return value is when __n == 0.
  _Tp* allocate(size_type __n, const void* = 0) {
    return __n != 0 ? static_cast<_Tp*>(_Alloc::allocate(__n * sizeof(_Tp))) 
                    : 0;
  }

  // __p is not permitted to be a null pointer.
  void deallocate(pointer __p, size_type __n)
    { _Alloc::deallocate(__p, __n * sizeof(_Tp)); }

  size_type max_size() const __STL_NOTHROW 
    { return size_t(-1) / sizeof(_Tp); }

  void construct(pointer __p, const _Tp& __val) { new(__p) _Tp(__val); }
  void destroy(pointer __p) { __p->~_Tp(); }
};

allocator的几个重要的函数功能包括：allocate、deallocate、address，这几个函数基本上就包括了我们在进行内存操作的几个重要操作，事实上，早先的hp版STL直接就是将c++中的new 和delete进行简单的包装，考虑到性能问题，SGI没有使用而已。SGI使用的是alloc这个文件。一般而言，c++进行内存配置和释放操作是由两部分构成的，

class FOO{}
Foo *pf = new FOO(); // 首先需要分配存储空间，然后调用构造函数构造对象
delete pd;   //首先调用析构函数，然后释放对象内存空间

为了实现精密分工，SGI将这两个步骤分离开来，分配存储空间、销毁存储空间和构造对象、析构对象的过程分开了。内存配置由alloc::allocator()负责，内存释放由alloc::deallocator()负责，构造对象由::construct()负责，析构对象由::destroy()函数负责。

1.construct() 与destroy()

construct() 与destroy()函数的代码如下所示：

template <class _T1, class _T2>
inline void _Construct(_T1* __p, const _T2& __value) {
  new ((void*) __p) _T1(__value);// MARK 1
}

template <class _T1>
inline void _Construct(_T1* __p) {
  new ((void*) __p) _T1();
}

template <class _Tp>
inline void _Destroy(_Tp* __pointer) {
  __pointer->~_Tp();
}

template <class _ForwardIterator>
void
__destroy_aux(_ForwardIterator __first, _ForwardIterator __last, __false_type)
{
  for ( ; __first != __last; ++__first)
    destroy(&*__first);
}

template <class _ForwardIterator> 
inline void __destroy_aux(_ForwardIterator, _ForwardIterator, __true_type) {}

template <class _ForwardIterator, class _Tp>
inline void 
__destroy(_ForwardIterator __first, _ForwardIterator __last, _Tp*)
{
  typedef typename __type_traits<_Tp>::has_trivial_destructor
          _Trivial_destructor; // MARK2
  __destroy_aux(__first, __last, _Trivial_destructor());
}

template <class _ForwardIterator>
inline void _Destroy(_ForwardIterator __first, _ForwardIterator __last) {
  __destroy(__first, __last, __VALUE_TYPE(__first));
}

inline void _Destroy(char*, char*) {}
inline void _Destroy(int*, int*) {}
inline void _Destroy(long*, long*) {}
inline void _Destroy(float*, float*) {}
inline void _Destroy(double*, double*) {}
#ifdef __STL_HAS_WCHAR_T
inline void _Destroy(wchar_t*, wchar_t*) {}
#endif /* __STL_HAS_WCHAR_T */

// --------------------------------------------------
// Old names from the HP STL.

template <class _T1, class _T2>
inline void construct(_T1* __p, const _T2& __value) {
  _Construct(__p, __value);
}

template <class _T1>
inline void construct(_T1* __p) {
  _Construct(__p);
}

template <class _Tp>
inline void destroy(_Tp* __pointer) {
  _Destroy(__pointer);
}

template <class _ForwardIterator>
inline void destroy(_ForwardIterator __first, _ForwardIterator __last) {
  _Destroy(__first, __last);
}

这部分代码里面，有两个点：

MARK1：

template <class _T1, class _T2>
inline void _Construct(_T1* __p, const _T2& __value) {
  new ((void*) __p) _T1(__value);// MARK 1
}

这里面的 new ((void*)__p)_T1(_value);这是c++中的placement new方法，将*p指向的内存空间进行初始化。placement new 是operator new的重载，operator new 的原形如下：

void *operator new( size_t, void *p ) throw()  { return p; }

区分一下new 、operator new 、placement new 。 new 和delete都是对堆中的内存进行申请和释放，这两个是不能重载的，要实现不同的内存分配方式，只能重载operator new。operator new就像operator+一样，是可以重载的，但是不能在全局对原型为void operator new(size_t size)这个原型进行重载，一般只能在类中进行重载。如果类中没有重载operator new，那么调用的就是全局的::operator new来完成堆的分配。同理，operator new[]、operator delete、operator delete[]也是可以重载的，一般你重载了其中一个，那么最好把其余三个都重载一遍。placement new是operator new的一个重载版本，只是我们很少用到它。如果你想在已经分配的内存中创建一个对象，使用new是不行的。也就是说placement new允许你在一个已经分配好的内存中（栈或堆中）构造一个新的对象。原型中void*p实际上就是指向一个已经分配好的内存缓冲区的的首地址。
我们知道使用new操作符分配内存需要在堆中查找足够大的剩余空间，这个操作速度是很慢的，而且有可能出现无法分配内存的异常（空间不够）。placement new就可以解决这个问题。我们构造对象都是在一个预先准备好了的内存缓冲区中进行，不需要查找内存，内存分配的时间是常数；而且不会出现在程序运行中途出现内存不足的异常。所以，placement new非常适合那些对时间要求比较高，长时间运行不希望被打断的应用程序。

小小的测试用例：

#include <iostream>
#include <new>
//using namespace std;

int main ()
{
    int p = 0;
    new(&p) int(3);
    std::cout << p <<std::endl;
}

这段代码输出来的是3，而不是0。

MARK2：

template <class _ForwardIterator, class _Tp>
inline void 
__destroy(_ForwardIterator __first, _ForwardIterator __last, _Tp*)
{
  typedef typename __type_traits<_Tp>::has_trivial_destructor
          _Trivial_destructor; // MARK2
  __destroy_aux(__first, __last, _Trivial_destructor());
}

这段代码里面，用到了一个称之为traits的技术，C++是无法判断对象的内行的，通过traits技术，以及C++对于类型推断，可以很好地实现类型的推断。下面通过一个例子来说明traits技术。

#include <iostream>

namespace namespacefx
{
struct __true_type {};
struct __false_type {};
template <class _Tp> struct __type_traits
{
    typedef __true_type this_is_dummy_member_must_be_first;
    typedef __false_type has_trivial_default_constructor;
    typedef __false_type has_trivial_copy_constructor;
    typedef __false_type has_trivial_assignment_operator;
    typedef __false_type has_trivial_destructor;
    typedef __false_type is_POD_type;
};

}

template <class T> void test_type_traits(T&t,namespacefx::__false_type)
{
    std::cout<< "false" << std::endl;
}
template <class T> void test_type_traits(T&t,namespacefx::__true_type)
{
    std::cout<< "true" << std::endl;
}
template <typename T> void test_type_traits(T&t)
{
    typedef typename namespacefx::__type_traits<T>::is_POD_type is_POD_type;
    namespacefx::__true_type tt;
    test_type_traits(t,typename namespacefx::__type_traits<T>::is_POD_type());
}
class A
{
public:
};
class B
{
public:
};
namespace namespacefx
{
    template<> struct __type_traits<A>
    {
        typedef __true_type  is_POD_type;
    };
    template <>
    struct __type_traits<B> {
        typedef __false_type is_POD_type;
    };
}
int main()
{
     A a;
     B b;
     test_type_traits(a);
     test_type_traits(b);
     return 0;
}

在namespacefx里面，定义了两个结构体 struct__true_type {}; 以及struct __false_type{};这两个结构体是没有自己的数据的，因此是不占用存储空间，它们的作用是用来进行类型推断的。struct __traits{}结构体定义了一个结构体模板，里面通过typedef 将结构体拥有的一些属性给标示出来，is_POD_type是用来表明对象是否是一个系统原有的类型，如float、double等，这个属性在STL后面会用来区分不同的分配方式，以提高效率。

测试用例中，我们调用test_type_traits(a); 运行过程中，首先会推断采用何种函数，由于只有一个变量，因此会调用template <typename T> void test_type_traits(T&t)这个函数，并推断类型T为A, 函数体里面这句话

typedef typename namespacefx::__type_traits<T>::is_POD_type is_POD_type;

T已经被编译器推断出来是 struct A类型的，那么此时应该调用的是template <class _Tp> struct __type_traits，由于在后面我们利用了称为模板特化的技术，也就是：

  template<> struct __type_traits<A>
    {
        <span style="color:#ff0000;">typedef __true_type  is_POD_type;</span>
    };
    template <>
    struct __type_traits<B> {
        typedef __false_type is_POD_type;
    };

因此typedef typename namespacefx::__type_traits<T>::is_POD_type is_POD_type; 这句话中的is_POD_type实际上是 typedef __true_type is_POD_type;，也就是说template <typename T> void test_type_traits(T&t)这个函数中，我们调用 test_type_traits(t,typename namespacefx::__type_traits<T>::is_POD_type());实际上是调用

test_type_traits(t,__true_type());，很明显此时通过类型推断，应该会最终去调用函数

template <class T> void test_type_traits(T&t,namespacefx::__true_type)
{
    std::cout<< "true" << std::endl;
}

因此第一个输出的结果是 true，同理第二个输出的结果是false。关于模板特化的技术，可以参阅http://blog.csdn.net/flymu0808/article/details/38070045。

2.空间配置与释放

SGI在设计空间配置与释放时，遵循的设计哲学如下：

向 system heap 要求空间
考虑多线程的状态
考虑内存不足的时候的应变措施
考虑过多的小型区块可能造成的内存碎片问题

为了简单，不考虑多线程的状态处理。

考虑到小区域快可能会造成的内存碎片问题，SGI设计了双层配置器，第一级使用malloc和free，第二级则使用不同的策略，如果申请的内存块大于128bytes的时候，就会调用第一级的配置器，如果区域块小于128bytes的时候，那么就会采用内存池的方式。 SGI默认采用的是第二级配置器。也就是__default_alloc_template 。

首先看一下第一级配置器。

template <int __inst>
class __malloc_alloc_template {

private:
//用于处理内存不足的情况 oom (out of memory)
  static void* _S_oom_malloc(size_t);
  static void* _S_oom_realloc(void*, size_t);

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
  static void (* __malloc_alloc_oom_handler)();
#endif

public:

  static void* allocate(size_t __n)
  {
    void* __result = malloc(__n);//直接使用malloc，如果出现问题才会调用S_oom_malloc函数
    if (0 == __result) __result = _S_oom_malloc(__n);
    return __result;
  }

  static void deallocate(void* __p, size_t /* __n */)
  {
    free(__p);//直接释放内存
  }

  static void* reallocate(void* __p, size_t /* old_sz */, size_t __new_sz)
  {
    void* __result = realloc(__p, __new_sz);
    if (0 == __result) __result = _S_oom_realloc(__p, __new_sz);
    return __result;
  }

  static void (* __set_malloc_handler(void (*__f)()))()
  {
    void (* __old)() = __malloc_alloc_oom_handler;
    __malloc_alloc_oom_handler = __f;
    return(__old);
  }

};

可以看到，这部分代码中，基本上是调用简单的系统分配函数。在看一下系统中的oom处理函数

void (* __malloc_alloc_template<__inst>::__malloc_alloc_oom_handler)() = 0;

__malloc_alloc_oom_handler是一个void (*)()的函数指针变量，初始化为0。

template <int __inst>
void*
__malloc_alloc_template<__inst>::_S_oom_malloc(size_t __n)
{
    void (* __my_malloc_handler)();
    void* __result;

    for (;;) {//不停的申请内存空间，直到申请成功为止
        __my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == __my_malloc_handler) { __THROW_BAD_ALLOC; }//如果没有配置异常处理函数，抛出异常
        (*__my_malloc_handler)();//调用处理程序，企图释放内存
        __result = malloc(__n);//再次尝试申请空间
        if (__result) return(__result);
    }
}

template <int __inst>
void* __malloc_alloc_template<__inst>::_S_oom_realloc(void* __p, size_t __n)
{
    void (* __my_malloc_handler)();
    void* __result;


    for (;;) {
        __my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == __my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*__my_malloc_handler)();
        __result = realloc(__p, __n);
        if (__result) return(__result);
    }
}

这两段代码中，基本思路都是不断地尝试申请内存和释放内存，直到成功为止，好暴力的说。

第二级配置器__default_alloc_template

第二级配置器主要是通过一些策略，实现对于碎片区域的处理。

SGI的策略是，如果申请的内存块大小超过128bytes，那么就直接调用第一级的申请函数，如果小于128bytes，则利用内存池的思路来实现空间配置。SGI将内存区域进行8字节对齐，并维护大小为16的free-lists，各自管理的内存大小从8,16,24... 一直到128. free-lists的节点结构如下：

 union _Obj {
        union _Obj* _M_free_list_link;
        char _M_client_data[1];    /* The client sees this.        */
  };

这个结构很巧妙，通过使用UNION，不需要浪费额外的存储空间，当这部分内存区域用来进行数据存储的时候，我们通过char来访问，那么整块区域都是用来存储数据的，当内存块是free的时候，那么_M_free_list_link这些指针就构成了链表中的链条，链接不同的区块。

    enum {_ALIGN = 8};
    enum {_MAX_BYTES = 128};
    enum {_NFREELISTS = 16}; // _MAX_BYTES/_ALIGN

上面是几个基本定义，8字节对齐，最大申请空间128，维持链表长度为free-list 16.

template <bool __threads, int __inst>
typename __default_alloc_template<__threads, __inst>::_Obj* __STL_VOLATILE
__default_alloc_template<__threads, __inst> ::_S_free_list[
# if defined(__SUNPRO_CC) || defined(__GNUC__) || defined(__HP_aCC)
    _NFREELISTS
# else
    __default_alloc_template<__threads, __inst>::_NFREELISTS
# endif
] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };//free-lists初始化为全0

计算申请内存对齐之后的大小：

static size_t
  _S_round_up(size_t __bytes) 
    { return (((__bytes) + (size_t) _ALIGN-1) & ~((size_t) _ALIGN - 1)); }

这个函数也很给力，利用&~操作就实现了对齐操作，速度很快。

allocate()函数负责分配区域块，如下：

static void* allocate(size_t __n)
  {
    void* __ret = 0;

    if (__n > (size_t) _MAX_BYTES) {//如果申请的内存空间大于128，采用最原始的分配方式
      __ret = malloc_alloc::allocate(__n);
    }
    else {
      _Obj* __STL_VOLATILE* __my_free_list
          = _S_free_list + _S_freelist_index(__n); //计算申请的区域块对齐之后，对应内存池的index。
      // Acquire the lock here with a constructor call.
      // This ensures that it is released in exit or during stack
      // unwinding.
#     ifndef _NOTHREADS
      /*REFERENCED*/
      _Lock __lock_instance;
#     endif
      _Obj* __RESTRICT __result = *__my_free_list;
      if (__result == 0)
        __ret = _S_refill(_S_round_up(__n));//如果内存池空，那么调用_S_refill函数进行内存池补充</span>
      else {
        *__my_free_list = __result -> _M_free_list_link;//直接返回内存池链指向的第一个宏稿件
        __ret = __result;
      }
    }

    return __ret;
  };

这部分操作如下图所示。

当内存池空的时候，会调用_S_refill函数补充内存池，这个函数具体结构如下；

template <bool __threads, int __inst>
void*
__default_alloc_template<__threads, __inst>::_S_refill(size_t __n)
{
    int __nobjs = 20;
    char* __chunk = _S_chunk_alloc(__n, __nobjs);//调用_S_chunk_alloc函数分配存储空间，其中__nobjs是引用，因此会被改变
    _Obj* __STL_VOLATILE* __my_free_list;
    _Obj* __result;
    _Obj* __current_obj;
    _Obj* __next_obj;
    int __i;

    if (1 == __nobjs) return(__chunk);//如果申请的内存空间只有一个节点，那么好吧，只能直接返回给用户了。
    __my_free_list = _S_free_list + _S_freelist_index(__n);//首先找到_n对应的free-list位置，并把多余的空间加到内存池中

    /* Build free list in chunk */
      __result = (_Obj*)__chunk;
      *__my_free_list = __next_obj = (_Obj*)(__chunk + __n);//添加第一个存储块，由于需要给用户一块，因此起始地址需要+n，其中n已经对齐处理了。
      for (__i = 1; ; __i++) {
        __current_obj = __next_obj;
        __next_obj = (_Obj*)((char*)__next_obj + __n);
        if (__nobjs - 1 == __i) {
            __current_obj -> _M_free_list_link = 0;//最后一个节点为0
            break;
        } else {
            __current_obj -> _M_free_list_link = __next_obj;
        }
      }
    return(__result);
}

这个函数里调用了_S_chunk_alloc，用于给存储池分配新的存储空间。

template <bool __threads, int __inst>
char*
__default_alloc_template<__threads, __inst>::_S_chunk_alloc(size_t __size, 
                                                            int& __nobjs)
{
    char* __result;
    size_t __total_bytes = __size * __nobjs;
    size_t __bytes_left = _S_end_free - _S_start_free;//整个内存池的剩余空间

    if (__bytes_left >= __total_bytes) {//完全能满足需求，返回需要的数据地址
        __result = _S_start_free;
        _S_start_free += __total_bytes;
        return(__result);
    } else if (__bytes_left >= __size) {//不够全部，但是应该可以分配一个以上区域块，也给他们。
        __nobjs = (int)(__bytes_left/__size);
        __total_bytes = __size * __nobjs;
        __result = _S_start_free;
        _S_start_free += __total_bytes;
        return(__result);
    } else {//好吧，一个都不够了，然后就尽量废物利用吧
        size_t __bytes_to_get = 
	  2 * __total_bytes + _S_round_up(_S_heap_size >> 4);//申请两倍空间，很多地方都用到这种策略，一般都是扩大一倍
        // Try to make use of the left-over piece.
        if (__bytes_left > 0) {
            _Obj* __STL_VOLATILE* __my_free_list =
                        _S_free_list + _S_freelist_index(__bytes_left);//添加这一块多余的内存到free-lists里面去把

            ((_Obj*)_S_start_free) -> _M_free_list_link = *__my_free_list;
            *__my_free_list = (_Obj*)_S_start_free;
        }
        _S_start_free = (char*)malloc(__bytes_to_get);//申请内存
        if (0 == _S_start_free) {//好吧，申请失败。
            size_t __i;
            _Obj* __STL_VOLATILE* __my_free_list;
	    _Obj* __p;
            // Try to make do with what we have.  That can't
            // hurt.  We do not try smaller requests, since that tends
            // to result in disaster on multi-process machines.//看看还能不能再做带你什么了，无非就是看看free-lists里面有没有更大的区域块，好可怜的说
            for (__i = __size;
                 __i <= (size_t) _MAX_BYTES;
                 __i += (size_t) _ALIGN) {
                __my_free_list = _S_free_list + _S_freelist_index(__i);
                __p = *__my_free_list;
                if (0 != __p) {//如果有更大的区域块，好吧，抢占它，然后递归调用，进行重新分配，这样就有存储空间了。
                    *__my_free_list = __p -> _M_free_list_link;
                    _S_start_free = (char*)__p;
                    _S_end_free = _S_start_free + __i;
                    return(_S_chunk_alloc(__size, __nobjs));
                    // Any leftover piece will eventually make it to the
                    // right free list.
                }
            }
	    _S_end_free = 0;	// In case of exception.
            _S_start_free = (char*)malloc_alloc::allocate(__bytes_to_get);//抛个异常吧
            // This should either throw an
            // exception or remedy the situation.  Thus we assume it
            // succeeded.
        }
        _S_heap_size += __bytes_to_get;
        _S_end_free = _S_start_free + __bytes_to_get;//申请成功了，更新存储池的大小，并重新分配存储空间。
        return(_S_chunk_alloc(__size, __nobjs));
    }
}

好了，基本上分配内存这一块就这样了。内存清除函数如下：

 static void deallocate(void* __p, size_t __n)
  {
    if (__n > (size_t) _MAX_BYTES)
      malloc_alloc::deallocate(__p, __n);
    else {
      _Obj* __STL_VOLATILE*  __my_free_list
          = _S_free_list + _S_freelist_index(__n);
      _Obj* __q = (_Obj*)__p;


      // acquire lock
#       ifndef _NOTHREADS
      /*REFERENCED*/
      _Lock __lock_instance;
#       endif /* _NOTHREADS */
      __q -> _M_free_list_link = *__my_free_list;//回收内存
      *__my_free_list = __q;
      // lock is released here
    }
  }

简单省事，基本上也是之前allocate的逆操作，操作图也是上图的逆过程。

整体感觉这部分allocate还是很精妙的，很多地方写的确实很漂亮，对union的使用、对齐处理、traits技术的使用以及存储空间的分配，都是值得学习的。

参照：

http://hubeihuyanwei.blog.163.com/blog/static/28205284201171722359640/

http://blog.csdn.net/budTang/archive/2008/05/06/2397013.aspx