glibc版本查看_GLIBC: heap basic1-CSDN博客

section 0 preface

由于不同glibc版本的heap实现会有一些不同，本文使用glibc-2.27，在64bit下进行探索。

本文主要内容为tcachebin的实现原理与攻击。

insomnia：GLIBC: heap basic0zhuanlan.zhihu.com

在上一篇文章的两个demo里，都出现了tcache机制，自glibc-2.26开始引入，旨在提高性能，但代价是安全性的降低，一味的追求性能，而减少了一些必要的检查，因此产生了各种安全问题。

section I tcache basic

上一篇文章提到了，tcache机制下的tcache_perthread_struct结构体：

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

观察到这个结构体被存储在heap被初始化后的第一个chunk里，size是0x250。

第一个域是一个char数组counts，char类型是因为每一个tcachebin的单链表最多存储7个元素，char类型已经足够了。

第二个域是tcache_entry的指针数组，数组里的每一项都是一个单链表。

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */
typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

且counts和tcache_entry都是 TCACHE_MAX_BINS项，查看其值为64。两个域是一一对应的。那么这个结构体的大小就是：size = 1*0x40 + 8*0x40 = 0x240，然后被放到一个chunk里，加上metadata，就是0x250了，与我们之前的观察是相符合的。

与fastbin数组一样，entries数组中的每一条链表，表中的chunk都拥有相同的size。以及不同于其他bin的，此处的chunk的fd指针，指向的都是下一个tcachebin的user_data部分。

之前提到了，这个tcache_perthread_struct结构体，是被放在heap的第一个chunk里的，因此，该结构体的初始化，是在第一次调用malloc的时候进行的。提一嘴，heap也是在第一次调用malloc之后才初始化完成的。

因此当我们第一次调用例如：a = malloc(0x10);时，heap段初始化，查看heap段会发现，里面出现了3个chunk，第一个是size=0x250的tcache_perthread_struct的chunk，第二个则是我们申请的chunk a，第三个则是topchunk。这些在上一篇文章的demo里都可以观察到。

我们调用malloc，实际上调用的是glibc中的__libc_malloc()函数，查看其源码：

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);//wow :P
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));
#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  checked_request2size (bytes, tbytes);
  size_t tc_idx = csize2tidx (tbytes);//chunk size to tcache index，算出index
       //#define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
       //通过这个宏定义我们也能知道，每一个chunk size所对应的chunk index。按16bytes递增。
  MAYBE_INIT_TCACHE ();   //################################### <== 初始化！

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins        /* mp_ : There is only one instance of the malloc 
                                         parameters. 没有具体说是啥，但是看起来是一个全
                                         局的堆属性结构体，具体内容我丢到文末去了。 */
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */ /* 其实就是64噢 */
      && tcache                                /* 由##...##初始化函数初始化[doge] */
      && tcache->entries[tc_idx] != NULL)      /* 里面都有货才行 */
    {
      return tcache_get (tc_idx); // get!
    }
  DIAG_POP_NEEDS_COMMENT;
#endif
/* 之后就是非tcache的chunk malloc了，此处不深入讨论了 */
............................................../* 省略 */
}
libc_hidden_def (__libc_malloc)

这就是调用初始化函数的那个宏定义：

# define MAYBE_INIT_TCACHE() 
  if (__glibc_unlikely (tcache == NULL)) 
    tcache_init();

接下来就是tcache_init()：

static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes);//获取arena地址到ar_ptr
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)//arena有了，但是chunk获取失败
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;//就是我们的第一个0x250的chunk
      memset (tcache, 0, sizeof (tcache_perthread_struct));//初始化之
    }

}

使用了tcache_init()已经初始化了全局变量tcache，因此继续看之前的__libc_malloc()，基本就是这条语句了：

return tcache_get (tc_idx);

看一看吧：

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);  /* 就进行了这两个软弱的检查 */
  tcache->entries[tc_idx] = e->next;     /* 就是这一行，很容易被攻击啊 */
  --(tcache->counts[tc_idx]);
  return (void *) e; //返回了chunk
}

接下来看free，glibc中是__libc_free() ，不过并没有直接在这里进行tcache的相关操作。而是在具体实现的_int_free()里面再判断是否开启了tcache，再进行操作的，由于_int_free的具体实现部分非常长，此处只展示与tcache有关的部分：

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
..........
  size = chunksize (p);
..........
/* 省略 */
#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif
....................
/* 省略 */
}

进行了简单的检查之后，调用了：

tcache_put (p, tc_idx);

追进去：

static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk); //将指针转换为指向user_data
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;                          //新的表头
  ++(tcache->counts[tc_idx]);
}

至此我们了解了tcachebin的相关的管理机制。

关于malloc与free在tcache的实现方面的这种不一致，即一个在__libc_*阶段进行tcache相关操作，一个在_int_*阶段才进行操作，对此我并没有什么好的见解，等之后补充吧，如果你读到这篇文章，有合理的解释，希望能与我分享！谢谢！

section II tcache attack

这些攻击基本就是建立在tcache_get()以及tcache_put()上的，基本都是UAF的漏洞而引起的任意地址写。

比如，我们刚刚释放一个chunk A，被放到了tcachebin中index=a的单链表的表头，然后我们通过UAF漏洞篡改这个chunk A 的fd指针为指定的target值。

此时我们继续申请index = a的chunk，那么tcache_get()将会通过：

tcache->entries[tc_idx] = e->next;

语句，将当前表头即chunk A的next域，即fd域赋值给tcache->entries[a]，因此，这次malloc，我们获得了指向chunk A的新指针。

此时tcachebin，即entries里面已经放着我们篡改的target值了，如果我们再申请一次index = a的 chunk，那么我们就会获得这个target指向的内存区域了。

从而我们获得了一次任意地址写。

不论是double free，还是house of spirit，都是类似的原理。攻击目标都是这个指针。

且需要应对的检查，在get阶段的：

这几乎不是一个检查...doge

assert (tc_idx < TCACHE_MAX_BINS);

即，在调用malloc的时候，我们传递进去一个参数大小，凭借这个参数算出chunk size，然后tc_idx由这个宏定义所计算：

#define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)

再结合tcache_perthread_struct中会存储0x40个项，可以知道，tidx=0时，对应的size是0x20，tidx=0x3F时，对应的size是tcache所能够存储的chunk_size大小必须小于0x410，则userdata最多是0x408bytes，最后的8bytes考虑空间复用。

section III demo

当我们劫持了tcache_bin中某个链表表头时，这个设置的target address，有什么要求。

先看看heap段在哪：

然后看看我们申请的chunks：

就如之前所说的，第一个0x250的chunk用于存放大小为0x240的tcache_perthread_struct，由于程序新开，就申请了2个小chunk，所以这个结构体里应该全是零。

执行程序，直到劫持chunk c之前。

我们劫持的chunk c的fd字段的内容，设置为0x602100，这个地址前后大段的地方，全是零。

此时malloc第一次获得chunk g，实际上指向chunk c，然后此时的tcache bin中的链表，指向我们的target address。

追踪此次malloc：

此时的寄存器状态：

可见，rsi是指向tcache_perthread_struct的，然后*(rsi+0x40)就是第一个（index=0）链表，检查其是否为空，然后再检查rax的值是否大于0x3f，即index的最大值。因此这个rax应该就是存放的是index了，向上求证：

原本rax用于存放chunk size，值为0x20，然后减去了0x11，然后右移4个bit。

得到index=0。

这个与之前提过的csize2tidx(x)宏定义是吻合的。然后我们就拿到了target address的一个chunk：

因此只要传递的参数OK，get部分就是没有检查的了。

但是如果你劫持target为一个诡异的值，比如0xdeadbeef，是不行的，因为malloc这个chunk之前，你会需要从这个0xdeadbeef中拿取一个qword（从表头chunk拿取fd指针），作为新的tcachebin中链表所指向的chunk，因此如果对0xdeadbeef解引用，就会直接崩溃。

但是这个问题一般也不会出现，谁会想往0xdeadbeef写东西呢？

啊，佛了，在此总结，tcache的攻击真的没啥要求。

mp_结构体：

/* There is only one instance of the malloc parameters.  */

static struct malloc_par mp_ =
{
  .top_pad = DEFAULT_TOP_PAD,
  .n_mmaps_max = DEFAULT_MMAP_MAX,
  .mmap_threshold = DEFAULT_MMAP_THRESHOLD,
  .trim_threshold = DEFAULT_TRIM_THRESHOLD,
#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))
  .arena_test = NARENAS_FROM_NCORES (1)
#if USE_TCACHE
  ,
  .tcache_count = TCACHE_FILL_COUNT,
  .tcache_bins = TCACHE_MAX_BINS,
  .tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),
  .tcache_unsorted_limit = 0 /* No limit.  */
#endif
};

https://ctf-wiki.github.io/ctf-wiki/pwn/linux/glibc-heap/implementation/tcache-zh/ctf-wiki.github.io