c++ malloc能给string_分析 Glibc 中的malloc/free 实现

最新推荐文章于 2023-01-04 22:59:09 发布

weixin_39657521

最新推荐文章于 2023-01-04 22:59:09 发布

阅读量479

点赞数

文章标签： c++ malloc能给string malloc 结构体

本文使用 Zhihu On VSCode 创作并发布

malloc 和 free 内部依赖的是GNU Allocator, 也叫Doug Lea的Allocator: http://gee.cs.oswego.edu/dl/html/malloc.html。

这篇分析会主要注意Allocator是怎么管理内存的。它就像操作系统和用户中间的一层，扮演了一个内存池的角色。本篇源码分析主要基于github上截取的一个glibc 仓库：https://github.com/sploitfun/lsploits。

总的来说，设计理念是：

首先在allocator会在堆上申请一大块连续内存区域，并且这个区域会被切分为各个部分。对于用户的请求大小进行分级，并且一层层地从fastbin->unsorted bins找到small bins，large bins。如果还找不到，就只能去top Chunk。top Chunk即可以通过sbrk拓展内存边界，也可以通过mmap来直接映射内存。

接下来深入细节去看一下：

先介绍3个概念，arena，Bins，Chunk，这样在分析源代码时会对理解有帮助。

free

这张图比较好的阐述了三者之间的关系，arena 是分配在堆上的一个堆区，中间的Free List保存的是bins，它代表指向Chunk的指针数组。而bins又可以分成3段，每一段保存了不同大小的Chunk。因此这类似于我之前讲述的管理器的设计，通过数组来管理指针，每个指针又是一个链表指针。释放内存块和申请内存块是对链表的增删查改。

malloc会对申请的大小进行评估，决定是使用sbrk还是mmap来分配堆区 (arena)，这样可以减少后续申请中向操作系统申请内存的次数。

三者概念的详细概念解释如下：

arena：通过sbrk或mmap系统调用为线程分配的堆区，按线程的类型可以分为2类：

main arena：主线程建立的arena；
thread arena：子线程建立的arena；

chunk：逻辑上划分的一小块内存，根据作用不同分为4类：

Allocated chunk：即分配给用户且未释放的内存块；
Free chunk：即用户已经释放的内存块；
Top chunk
Last Remainder chunk

bin：一个用以保存Free chunk链表的表头信息的指针数组，按所悬挂链表的类型可以分为4类:

Fast bin
Unsorted bin
Small bin
Large bin

说一下核心数据结构：

malloc_state – Arena 头 – 单个 thread arena 有多个堆，但是对于所有这些堆来说，只存在一个 arena 头。arena 头包含 bins、top chunk、最后剩余块……

malloc_chunk – 块头 – 一个堆基于用户请求被分为很多块。每个块有它自己的块头。

Malloc_state有点像数据库的元数据，而malloc_chunk真正保存内存块。

这个是管理的分配区的结构体，主要包含bins，top(top Chunk)，linked list for free arena，先看一下有个整体印象。

/*
   have_fastchunks indicates that there are probably some fastbin chunks.
   It is set true on entering a chunk into any fastbin, and cleared early in
   malloc_consolidate.  The value is approximate since it may be set when there
   are no fastbin chunks, or it may be clear even if there are fastbin chunks
   available.  Given it's sole purpose is to reduce number of redundant calls to
   malloc_consolidate, it does not affect correctness.  As a result we can safely
   use relaxed atomic accesses.
 */
struct malloc_state
{
    /* Serialize access.  */
    __libc_lock_define (, mutex);

    /* Flags (formerly in max_fast).  */
    int flags;

    /* Set if the fastbin chunks contain recently inserted free blocks.  */
    /* Note this is a bool but not all targets support atomics on booleans.  */
    int have_fastchunks;

    /* Fastbins */
    mfastbinptr fastbinsY[NFASTBINS];

    /* Base of the topmost chunk -- not otherwise kept in a bin */
    mchunkptr top;

    /* The remainder from the most recent split of a small request */
    mchunkptr last_remainder;

    /* Normal bins packed as described above */
    mchunkptr bins[NBINS * 2 - 2];

    /* Bitmap of bins */
    unsigned int binmap[BINMAPSIZE];

    /* Linked list */
    struct malloc_state *next;

    /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
    struct malloc_state *next_free;

    /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
    INTERNAL_SIZE_T attached_threads;

    /* Memory allocated from the system in this arena.  */
    INTERNAL_SIZE_T system_mem;
    INTERNAL_SIZE_T max_system_mem;
};

Malloc

先看一张图对于申请内存的过程有个整体感知：

img

Malloc 函数整体结构

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

  arena_lookup (ar_ptr); // arena 逻辑部分

  arena_lock (ar_ptr, bytes);
  if (!ar_ptr)
    return 0;

  victim = _int_malloc (ar_ptr, bytes);
  if (!victim)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      if (__builtin_expect (ar_ptr != NULL, 1))
        {
          victim = _int_malloc (ar_ptr, bytes);
          (void) mutex_unlock (&ar_ptr->mutex);
        }
    }
  else
    (void) mutex_unlock (&ar_ptr->mutex);
  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}

Arena

先定位分析有arena的地方。Allocator的查找是先会在全局的Arena List里面找，如果能找到就复用，找不到就new一个新的Arena。

arena_lookup 是搜索的过程，arena_lock是给这块区域加上Mutex锁。arena_lookup在global arena linked list里面找，但有可能找不到可以复用的arena，于是这个时候arena_lock拿到的ptr就是空的，它会自己new一个arena。

#define arena_lock(ptr, size) do {					      
      if (ptr)								      
        (void) mutex_lock (&ptr->mutex);				      
      else								      
        ptr = arena_get2 (ptr, (size), NULL);				      
  } while (0)

arena_get2（省略号省略部分逻辑）就包括了_int_new_arena的逻辑。

.......
      if (__glibc_unlikely (n <= narenas_limit - 1))

        {

          if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))

            goto repeat;

          a = _int_new_arena (size);

    if (__glibc_unlikely (a == NULL))

            catomic_decrement (&narenas);

        }

      else

        a = reused_arena (avoid_arena);
........

_int_new_arena新建了一块arena，而这个函数会基于mmap来做分配。进入里面并跟踪到new_heap。由于经过了几层封装，只讲核心代码。可以看到调用了MMAP (mmap) 来分配在堆上的内存。

..............
if (aligned_heap_area)

    {

      p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,

                          MAP_NORESERVE);

      aligned_heap_area = NULL;

      if (p2 != MAP_FAILED && ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)))

        {

          __munmap (p2, HEAP_MAX_SIZE);

          p2 = MAP_FAILED;

        }

    }

  if (p2 == MAP_FAILED)

    {

      p1 = (char *) MMAP (0, HEAP_MAX_SIZE << 1, PROT_NONE, MAP_NORESERVE);
.......

总结：

为了达到复用，arena会先从一个list上找可以复用的arena，如果找不到就用mmap创建一个，分配在堆上。

_int_malloc

Fast Bins

如果把Bins比作数据库的话，Fast Bins可以说是它们的缓存。它有着比较快的申请和释放速度。因为它内部是一个LIFO的链表，所有增删差改都在链表头部进行。

接下来分析一下_int_malloc。这里是主要处理分配的函数，会根据要申请的size分成fastbin，bins。

  /*
     If the size qualifies as a fastbin, first check corresponding bin.
     This code is safe to execute even if av is not yet initialized, so we
     can try it without checking, which saves some time on this fast path.
   */

  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp = *fb;
      do
        {
          victim = pp;
          if (victim == NULL)
            break;
        }
      while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
             != victim);
      if (victim != 0)
        {
          if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
            {
              errstr = "malloc(): memory corruption (fast)";
            errout:
              malloc_printerr (check_action, errstr, chunk2mem (victim));
              return NULL;
            }
          check_remalloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }

arena里面有一个fastbinY数组，一共10个元素，每个元素都相当于链表头部，记录着对应的堆块指针

如果对应索引里面有堆块，那么就取出来，然后链表头部指向下一个堆块（如果有），这个是通过fd判断的。

所以fastbin每一条链上的堆块大小都是一样的，而且属于先进后出FILO结构。

如果fastbin对应idx里有元素，才进入其中操作。

这里重点关注一下mchunkptr pp。可以看到从fastbin中取出对应index的链表指针后，把它赋给了pp。pp指向的结构就是最核心的malloc_chunk

/* Forward declarations.  */
struct malloc_chunk;
typedef struct malloc_chunk* mchunkptr;

struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

Free chunk

从这个结构体可以知道，我们是在处理一个双向链表，nextsize是large bins 用来加速相同size的chunk的辅助指针。

最后调用的chunk2mem也比较重要，主要是把这个malloc_chunk结构体转成用户可以使用的内存指针，然后返回。

Fast bin最大的特点是无需合并 —— 两个相邻 chunk 不会被合并。虽然这可能会加剧内存碎片化，但也大大加速了内存释放的速度。

img

Bins

先对Bins有个整体的概括：

用户 free 掉的内存并不是都会马上归还给系统, 相反, ptmalloc 会统一管理 heap 中的空闲的 chunk, 当用户进行下一次分配请求时, ptmalloc 会首先试图在 heap 中空闲的 chunk 中挑选一块给用户, 这样就避免了频繁的系统调用, 降低了内存分配的开销。ptmalloc 将 heap 中相似大小的 chunk 用双向链表链接起来, 这样的一个链表被称为一个bin。 ptmalloc 共维护了128个bin, 并使用一个数组来存储这些 bin。

img

small bins（2-63）

接下来进入bins的数组。由之前的图可知，bins分为small bins,unsorted bins, large bins。大小小于 512 字节的 chunk 被称为「small chunk」，而保存 small chunks 的 bin 被称为「small bin」。在内存分配回收的速度上，small bin 比 large bin 更快。unsorted bins只有一条链表，而small bins和 large bins有比较多条链表。small bins数组管理的一个bin里的所有chunk的size都相同，相邻的bin相差8bytes。

源码先处理了small bins的情况

if (in_smallbin_range (nb))
{
    idx = smallbin_index (nb);
    bin = bin_at (av, idx);

    if ((victim = last (bin)) != bin)
    {
        if (victim == 0) /* initialization check */
            malloc_consolidate (av);
        else
        {
            bck = victim->bk;
            if (__glibc_unlikely (bck->fd != victim))
            {
                errstr = "malloc(): smallbin double linked list corrupted";
                goto errout;
            }
            set_inuse_bit_at_offset (victim, nb);
            bin->bk = bck;
            bck->fd = bin;

            if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
            check_malloced_chunk (av, victim, nb);
            void *p = chunk2mem (victim);
            alloc_perturb (p, bytes);
            return p;
        }
    }
}

malloc consolidate是从freebins里取一些malloc_chunk合并到bins里，减少内存碎片。

总的思路就是先获取small bin index，然后从数组中取出对应链表。

首先我们要取的是victim。由于这个链表是个双向链表，所以你在取一个chunck的时候，需要把bk和fd的关系都梳理好。可以看到bin->bk = bck，bck->fd = bin这两句就是在把victim从链表中剥离。

unsorted bins（1）

当 small chunk 和 large chunk 被 free 掉时，它们并非被添加到各自的 bin 中，而是被添加在「unsorted bin」中。这使得分配器可以重新使用最近 free 掉的 chunk，从而消除了寻找合适 bin 的时间开销，进而加速了内存分配及释放的效率。(也就是说，尽最大限度地复用已经free过的bin，而不用再重新去small bins 和 large bins中找)

这个机制属于malloc对于释放内存块的一个缓冲区。如果被用户释放的 chunk 大于 max_fast, 它应该会被放到 bins中。但是其实这些chunk会被首先放到unsorted bins中。进行 malloc 操作的时候, 如果在 fastbins 中没有找到合适的 chunk, 则 ptmalloc 会先在 unsorted bins 中查找合适的空闲 chunk, 然后才查找 bins。如果 unsorted bins 不能满足分配要求， malloc 便会将 unsorted bins 中的 chunk 放到 bins 中, 然后再在 bins 中继续进行查找和分配过程。可见unsorted bins是属于对malloc的优化，首先它只有一条链，free的时候只要不是fastbin，那么统统先丢到这里链起来，这么做其实就是为了快，对于理解本质没有明显帮助，因此暂时不对它的源代码进行分析。

它的一个明显特点是不对chunk的大小做要求，而bins里的其他都是会要求的。为什么要转移到Large bins/small bins呢？其实也就是因为，unsorted bins不能做合并，要移到指定的bin区域，再合并这些碎片。

Large Bins（64-126）

无论是small bins 还是 large bins，他们都需要对free chunk进行合并，此举主要是减少内存碎片，但是会减缓free的速度。因此这也是fast bin的意义所在。

之前分析了Small Bins，这里再分析Large Bins。Large Bins是ordered bins，也就是说它的每一个Bin里的chunk都是按大小排列的，而small bin里的每个chunk都是相同的大小。可以看到，如果在unsotred bins里面没有找到我们想要的结果，那么就会去large bins 或者 small bins里面找。这个时候，需要把unsorted bins 上的一些chunk取下来放到large bins或者small bins上面，而因为large bins是一个有序的chunk bin，所以在插入链表的过程中需要保持链表的有序结构。

......................
          /* place chunk in bin */

          if (in_smallbin_range (size))
            {
              victim_index = smallbin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;
            }
          else
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;

              /* maintain large bins in sorted order */
              if (fwd != bck)
                {
                  /* Or with inuse bit to speed comparisons */
                  size |= PREV_INUSE;
                  /* if smaller than smallest, bypass loop below */
                  assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
                  if ((unsigned long) (size) < (unsigned long) (bck->bk->size))
                    {
                      fwd = bck;
                      bck = bck->bk;

                      victim->fd_nextsize = fwd->fd;
                      victim->bk_nextsize = fwd->fd->bk_nextsize;
                      fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
                    }
                  else
                    {
                      assert ((fwd->size & NON_MAIN_ARENA) == 0);
                      while ((unsigned long) size < fwd->size)
                        {
                          fwd = fwd->fd_nextsize;
                          assert ((fwd->size & NON_MAIN_ARENA) == 0);
                        }

                      if ((unsigned long) size == (unsigned long) fwd->size)
                        /* Always insert in the second position.  */
                        fwd = fwd->fd;
                      else
                        {
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;
                    }
                }
              else
                victim->fd_nextsize = victim->bk_nextsize = victim;
            }

          mark_bin (av, victim_index);
          victim->bk = bck;
          victim->fd = fwd;
          fwd->bk = victim;
          bck->fd = victim;
......................

这里的逻辑主要是发现fastbins和unsorted bins不能满足要求，于是取出全部unsort chunk放进某个bins中，然后再进行malloc。也就是新一轮的查找，不过这次是在large bins里面找。

这里想要理清每个细节就比较难了。主要思想是从large bins里的有序bin中search for a chunk，找到后是常规的splite操作。

      /*
         Search for a chunk by scanning bins, starting with next largest
         bin. This search is strictly by best-fit; i.e., the smallest
         (with ties going to approximately the least recently used) chunk
         that fits is selected.

         The bitmap avoids needing to check that most blocks are nonempty.
         The particular case of skipping all bins during warm-up phases
         when no chunks have been returned yet is faster than it might look.
       */

      ++idx;
      bin = bin_at (av, idx);
      block = idx2block (idx);
      map = av->binmap[block];
      bit = idx2bit (idx);

      for (;; )
        {
          /* Skip rest of block if there are no more set bits in this block.  */
          if (bit > map || bit == 0)
            {
              do
                {
                  if (++block >= BINMAPSIZE) /* out of bins */
                    goto use_top;
                }
              while ((map = av->binmap[block]) == 0);

              bin = bin_at (av, (block << BINMAPSHIFT));
              bit = 1;
            }

          /* Advance to bin with set bit. There must be one. */
          while ((bit & map) == 0)
            {
              bin = next_bin (bin);
              bit <<= 1;
              assert (bit != 0);
            }

          /* Inspect the bin. It is likely to be non-empty */
          victim = last (bin);

          /*  If a false alarm (empty bin), clear the bit. */
          if (victim == bin)
            {
              av->binmap[block] = map &= ~bit; /* Write through */
              bin = next_bin (bin);
              bit <<= 1;
            }

          else
            {
              size = chunksize (victim);

              /*  We know the first chunk in this bin is big enough to use. */
              assert ((unsigned long) (size) >= (unsigned long) (nb));

              remainder_size = size - nb;

              /* unlink */
              unlink (victim, bck, fwd);

              /* Exhaust */
              if (remainder_size < MINSIZE)
                {
                  set_inuse_bit_at_offset (victim, size);
                  if (av != &main_arena)
                    victim->size |= NON_MAIN_ARENA;
                }

              /* Split */
              else
                {
                  remainder = chunk_at_offset (victim, nb);

                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  */
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
	  if (__glibc_unlikely (fwd->bk != bck))
                    {
                      errstr = "malloc(): corrupted unsorted chunks 2";
                      goto errout;
                    }
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;

                  /* advertise as last remainder */
                  if (in_smallbin_range (nb))
                    av->last_remainder = remainder;
                  if (!in_smallbin_range (remainder_size))
                    {
                      remainder->fd_nextsize = NULL;
                      remainder->bk_nextsize = NULL;
                    }
                  set_head (victim, nb | PREV_INUSE |
                            (av != &main_arena ? NON_MAIN_ARENA : 0));
                  set_head (remainder, remainder_size | PREV_INUSE);
                  set_foot (remainder, remainder_size);
                }
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }
        }

可以看到源码这里出现了remainder chunk，这是因为ptmalloc只会去找一个大小最接近用户请求的chunk。一旦找到，相应 chunk 就会被切分成两块：

User chunk（用户请求大小）—— 返回给用户；
Remainder chunk （剩余大小）—— 添加到 unsorted bin。

而且这里也有比较有趣的细节，就是ptmalloc也使用了bitmap来加速扫描。如果相应 bin 中的最大 chunk 大小小于用户请求大小，分配器就会扫描 binmaps，从而查找最小非空 bin。如果找到了这样的 bin，就从中选择合适的 chunk 并切割给用户；反之就使用 top chunk 响应用户请求。

Top Chunk

还有可能，已经连Large Bins都无法满足的内存大小，就需要通过Top Chunk来完成。在前面一直提到, ptmalloc 会预先分配一块较大的空闲内存(也就是所为的 heap), 而通过管理这块内存来响应用户的需求, 因为内存是按地址从低向高进行分配的, 在空闲内存的最高处, 必然存在着一块空闲 chunk, 叫做 “top chunk”. 当 bins 和 fastbins 都不能满足分配需要的时候, ptmalloc 会设法在 “top chunk” 中分出一块内存给用户, 如果 “top chunk” 本身不够大, 则 ptmalloc 会适当的增加它的大小(也就增加了 heap 的大小). 以满足分配的需要, 实际上, “top chunk” 在分配时总是在 ‘fastbins 和 bins 之后被考虑, 所以, 不论 “top chunk” 有多大, 它都不会被放到 fastbins 或者是 bins 中。

这里的use_top，其实就是在使用top_chunk部分，也就是分配的堆区的顶端的内存。依然是best-fit search rule。

    use_top:
      /*
         If large enough, split off the chunk bordering the end of memory
         (held in av->top). Note that this is in accord with the best-fit
         search rule.  In effect, av->top is treated as larger (and thus
         less well fitting) than any other available chunk since it can
         be extended to be as large as necessary (up to system
         limitations).

         We require that av->top always exists (i.e., has size >=
         MINSIZE) after initialization, so if it would otherwise be
         exhausted by current request, it is replenished. (The main
         reason for ensuring it exists is that we may need MINSIZE space
         to put in fenceposts in sysmalloc.)
       */

      victim = av->top;
      size = chunksize (victim);

      if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
        {
          remainder_size = size - nb;
          remainder = chunk_at_offset (victim, nb);
          av->top = remainder;
          set_head (victim, nb | PREV_INUSE |
                    (av != &main_arena ? NON_MAIN_ARENA : 0));
          set_head (remainder, remainder_size | PREV_INUSE);

          check_malloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }

      /* When we are using atomic ops to free fast chunks we can get
         here for all block sizes.  */
      else if (have_fastchunks (av))
        {
          malloc_consolidate (av);
          /* restore original bin index */
          if (in_smallbin_range (nb))
            idx = smallbin_index (nb);
          else
            idx = largebin_index (nb);
        }
.................
    }

不过，当需要分配的 chunk 足够大, 而且 fastbins 和 bins 都不能满足要求, 甚至 “top chunk” 本身也不能满足分配需求时, ptmalloc 会使用 mmap 来直接使用内存映射来将页映射到进程空间。这样分配的 chunk 在被 free 时将直接解除映射, 于是就将内存归还给了系统, 而不是allocator。再次对这样的内存区的引用将导致一个 “segmentation fault” 错误. 这样的 chunk 也不会包含在任何 bin 中。

可见，对于比较大的内存申请，Allocator 会先找Top Chunk，然后还不够，才需要sysmalloc (sysmalloc里会做判断，是调用sbrk扩大top chunk的大小，还是调用mmap来调用一个sub-heap)。

      /*
         Otherwise, relay to handle system-dependent cases
       */
      else
        {
          void *p = sysmalloc (nb, av);
          if (p != NULL)
            alloc_perturb (p, bytes);
          return p;
        }

Free

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);
  if (__builtin_expect (hook != NULL, 0))
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect */
    return;

  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* see if the dynamic brk/mmap threshold needs adjusting */
      if (!mp_.no_dyn_threshold
          && p->size > mp_.mmap_threshold
          && p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);
}
libc_hidden_def (__libc_free)

free 函数做的事情比较简单，就是先从memory转移为内部状态chunk，然后如果这个区域是mmap的，直接解除映射即可。而如果是其他情况，那么就定位到对应的arena，然后释放对应指针。

这个就是核心free chunk的函数，不过由于时间关系，不能很细的分析了。讲一下大概流程。

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr *fb;             /* associated fastbin */
  mchunkptr nextchunk;         /* next contiguous chunk */
  INTERNAL_SIZE_T nextsize;    /* its size */
  int nextinuse;               /* true if nextchunk is used */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
  mchunkptr bck;               /* misc temp for linking */
  mchunkptr fwd;               /* misc temp for linking */

  const char *errstr = NULL;
  int locked = 0;

  size = chunksize (p);

  /* Little security check which won't hurt performance: the
     allocator never wrapps around at the end of the address space.
     Therefore we can exclude some size values which might appear
     here by accident or by "design" from some intruder.  */
  if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
      || __builtin_expect (misaligned_chunk (p), 0))
    {
      errstr = "free(): invalid pointer";
    errout:
      if (!have_lock && locked)
        (void) mutex_unlock (&av->mutex);
      malloc_printerr (check_action, errstr, chunk2mem (p));
      return;
    }
  /* We know that each chunk is at least MINSIZE bytes in size or a
     multiple of MALLOC_ALIGNMENT.  */
  if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))
    {
      errstr = "free(): invalid size";
      goto errout;
    }

  check_inuse_chunk(av, p);

  /*
    If eligible, place chunk on a fastbin so it can be found
    and used quickly in malloc.
  */

  if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())

#if TRIM_FASTBINS
      /*
	If TRIM_FASTBINS set, don't place chunks
	bordering top into fastbins
      */
      && (chunk_at_offset(p, size) != av->top)
#endif
      ) {

    if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)
	|| __builtin_expect (chunksize (chunk_at_offset (p, size))
			     >= av->system_mem, 0))
      {
	/* We might not have a lock at this point and concurrent modifications
	   of system_mem might have let to a false positive.  Redo the test
	   after getting the lock.  */
	if (have_lock
	    || ({ assert (locked == 0);
		  mutex_lock(&av->mutex);
		  locked = 1;
		  chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
		    || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
	      }))
	  {
	    errstr = "free(): invalid next size (fast)";
	    goto errout;
	  }
	if (! have_lock)
	  {
	    (void)mutex_unlock(&av->mutex);
	    locked = 0;
	  }
      }

    free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

    set_fastchunks(av);
    unsigned int idx = fastbin_index(size);
    fb = &fastbin (av, idx);

    /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
    mchunkptr old = *fb, old2;
    unsigned int old_idx = ~0u;
    do
      {
	/* Check that the top of the bin is not the record we are going to add
	   (i.e., double free).  */
	if (__builtin_expect (old == p, 0))
	  {
	    errstr = "double free or corruption (fasttop)";
	    goto errout;
	  }
	/* Check that size of fastbin chunk at the top is the same as
	   size of the chunk that we are adding.  We can dereference OLD
	   only if we have the lock, otherwise it might have already been
	   deallocated.  See use of OLD_IDX below for the actual check.  */
	if (have_lock && old != NULL)
	  old_idx = fastbin_index(chunksize(old));
	p->fd = old2 = old;
      }
    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);

    if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0))
      {
	errstr = "invalid fastbin entry (free)";
	goto errout;
      }
  }

  /*
    Consolidate other non-mmapped chunks as they arrive.
  */

  else if (!chunk_is_mmapped(p)) {
    if (! have_lock) {
      (void)mutex_lock(&av->mutex);
      locked = 1;
    }

    nextchunk = chunk_at_offset(p, size);

    /* Lightweight tests: check whether the block is already the
       top block.  */
    if (__glibc_unlikely (p == av->top))
      {
	errstr = "double free or corruption (top)";
	goto errout;
      }
    /* Or whether the next chunk is beyond the boundaries of the arena.  */
    if (__builtin_expect (contiguous (av)
			  && (char *) nextchunk
			  >= ((char *) av->top + chunksize(av->top)), 0))
      {
	errstr = "double free or corruption (out)";
	goto errout;
      }
    /* Or whether the block is actually not marked used. */
    if (__glibc_unlikely (!prev_inuse(nextchunk)))
      {
	errstr = "double free or corruption (!prev)";
	goto errout;
      }

    nextsize = chunksize(nextchunk);
    if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0)
	|| __builtin_expect (nextsize >= av->system_mem, 0))
      {
	errstr = "free(): invalid next size (normal)";
	goto errout;
      }

    free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

    /* consolidate backward */
    if (!prev_inuse(p)) {
      prevsize = p->prev_size;
      size += prevsize;
      p = chunk_at_offset(p, -((long) prevsize));
      unlink(p, bck, fwd);
    }

    if (nextchunk != av->top) {
      /* get and clear inuse bit */
      nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

      /* consolidate forward */
      if (!nextinuse) {
	unlink(nextchunk, bck, fwd);
	size += nextsize;
      } else
	clear_inuse_bit_at_offset(nextchunk, 0);

      /*
	Place the chunk in unsorted chunk list. Chunks are
	not placed into regular bins until after they have
	been given one chance to be used in malloc.
      */

      bck = unsorted_chunks(av);
      fwd = bck->fd;
      if (__glibc_unlikely (fwd->bk != bck))
	{
	  errstr = "free(): corrupted unsorted chunks";
	  goto errout;
	}
      p->fd = fwd;
      p->bk = bck;
      if (!in_smallbin_range(size))
	{
	  p->fd_nextsize = NULL;
	  p->bk_nextsize = NULL;
	}
      bck->fd = p;
      fwd->bk = p;

      set_head(p, size | PREV_INUSE);
      set_foot(p, size);

      check_free_chunk(av, p);
    }

    /*
      If the chunk borders the current high end of memory,
      consolidate into top
    */

    else {
      size += nextsize;
      set_head(p, size | PREV_INUSE);
      av->top = p;
      check_chunk(av, p);
    }

    /*
      If freeing a large space, consolidate possibly-surrounding
      chunks. Then, if the total unused topmost memory exceeds trim
      threshold, ask malloc_trim to reduce top.

      Unless max_fast is 0, we don't know if there are fastbins
      bordering top, so we cannot tell for sure whether threshold
      has been reached unless fastbins are consolidated.  But we
      don't want to consolidate on each free.  As a compromise,
      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
      is reached.
    */

    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
      if (have_fastchunks(av))
	malloc_consolidate(av);

      if (av == &main_arena) {
#ifndef MORECORE_CANNOT_TRIM
	if ((unsigned long)(chunksize(av->top)) >=
	    (unsigned long)(mp_.trim_threshold))
	  systrim(mp_.top_pad, av);
#endif
      } else {
	/* Always try heap_trim(), even if the top chunk is not
	   large, because the corresponding heap might go away.  */
	heap_info *heap = heap_for_ptr(top(av));

	assert(heap->ar_ptr == av);
	heap_trim(heap, mp_.top_pad);
      }
    }

    if (! have_lock) {
      assert (locked);
      (void)mutex_unlock(&av->mutex);
    }
  }
  /*
    If the chunk was allocated via mmap, release via munmap().
  */

  else {
    munmap_chunk (p);
  }
}

首先会计算free的chunk的size，然后在fast bin中去找这个chunk。因为bin中大小都是有对应的，所以可以根据size找到链表。然后主要是检查相邻的chunk，看有没有合并优化的空间。

将合并后的chunk加入unsorted_bin的双向循环链表中，同时调用malloc_consolidate()函数合并 fast bins 中的空闲 chunk 到 unsorted bin 中。

而为什么free的时候不用申明内存大小，之前写过一篇文章：https://zhuanlan.zhihu.com/p/145953980 简要讲了一下这个点。

对于Libc的malloc/free的学习就到这里，细节部分还可以研究的更深，碍于时间关系就留着下次，不过对于malloc的整体设计有了比较清晰的了解。

参考来源： https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/comment-page-1/?blogsub=confirming#subscribe-blog%E3%80%82

https://blog.csdn.net/maokelong95/article/details/52006379