lua table 的rehash

最新推荐文章于 2021-10-15 11:59:16 发布

ball32109

最新推荐文章于 2021-10-15 11:59:16 发布

阅读量1.8k

点赞数

分类专栏： lua 文章标签： lua

本文链接：https://blog.csdn.net/ball32109/article/details/44906403

版权

lua 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

先看rehash实现:

static void rehash (lua_State *L, Table *t, const TValue *ek) {
  int nasize, na;
  int nums[MAXBITS+1];  /* nums[i] = number of keys with 2^(i-1) < k <= 2^i */
  int i;
  int totaluse;
  for (i=0; i<=MAXBITS; i++) nums[i] = 0;  /* reset counts */
  nasize = numusearray(t, nums);  /* count keys in array part */
  totaluse = nasize;  /* all those keys are integer keys */
  totaluse += numusehash(t, nums, &nasize);  /* count keys in hash part */
  /* count extra key */
  nasize += countint(ek, nums);
  totaluse++;
  /* compute new size for array part */
  na = computesizes(nums, &nasize);
  /* resize the table to new computed sizes */
  luaH_resize(L, t, nasize, totaluse - na);
}

nums[MAXIBITS+1]:先把table中的数组部分和hash部分的所有key为number类形的按区域2^n的分布分别

统计起来，比如:nums[1]=1,num2[4] = 12,表示key<=2^1时和key<=2^4&&key>2^3时的key有多少个.

再看numusearray函数，从函数名得知，此函数就是统计所有key为number的存在于数组部分有多少个:

static int numusearray (const Table *t, int *nums) {
  int lg;
  int ttlg;  /* 2^lg */
  int ause = 0;  /* summation of `nums' */
  int i = 1;  /* count to traverse all array keys */
  for (lg=0, ttlg=1; lg<=MAXBITS; lg++, ttlg*=2) {  /* for each slice */
    int lc = 0;  /* counter */
    int lim = ttlg;
    if (lim > t->sizearray) {
      lim = t->sizearray;  /* adjust upper limit */
      if (i > lim)
        break;  /* no more elements to count */
    }
    /* count elements in range (2^(lg-1), 2^lg] */
    for (; i <= lim; i++) {
      if (!ttisnil(&t->array[i-1]))
        lc++;
    }
    nums[lg] += lc;
    ause += lc;
  }
  return ause;
}

numarray统计数组部分区域key的数量。

lim就是2^lg =ttlg,如果lim大于数组大小，而且i大于lim，就不用统计，因为i是之前从0开始的一直统计完的计数。

ause表示数组部分key为number的有多少个，其实也就是数组部分有效的key有多少个。

static int numusehash (const Table *t, int *nums, int *pnasize) {
  int totaluse = 0;  /* total number of elements */
  int ause = 0;  /* summation of `nums' */
  int i = sizenode(t);
  while (i--) {
    Node *n = &t->node[i];
    if (!ttisnil(gval(n))) {
      ause += countint(gkey(n), nums);
      totaluse++;
    }
  }
  *pnasize += ause;
  return totaluse;
}

numusehash 就是统计 hash 部分的 key 为 number 类型的 ( 不包括浮点形的，从 countint 的 arrayindex 可以得知 ):

static int countint (const TValue *key, int *nums) {
  int k = arrayindex(key);
  if (0 < k && k <= MAXASIZE) {  /* is `key' an appropriate array index? */
    nums[luaO_ceillog2(k)]++;  /* count as such */
    return 1;
  }
  else
    return 0;
}

static int arrayindex (const TValue *key) {
  if (ttisnumber(key)) {
    lua_Number n = nvalue(key);
    int k;
    lua_number2int(k, n);
    if (luai_numeq(cast_num(k), n))
      return k;
  }
  return -1;  /* `key' did not match some condition */
}

从arrayindex的函数名可以得知，此函数就是统计key能否进入array部分，并不是所有number

做key都可以进入array部分，浮点形就不行，于是有了luai_numeq(cast_num(k), n)的判断.

arrayindex返回的是key的int类型.

然后在countint 函数的luaO_ceillog2(k),求得k的2的幂是多少，然后放进统计nums.

再回来看rehash函数

先统计array部分的key，nasize从名字得知numberarray size,就是数组部分的key为number有多少个。

其实是等于localsize = 0; for k,v in pairs(nums) do size = size + v end(lua代码，方便表达).

nasize = size的。

再统计hash部分，再统计ek，ek就是rehash之前要插入的value。

好，先看一下到computesizes之前几个临时变量的情况。

nasize是数组部分和hash部分和将要插入的ek的key为number的数量，totaluse是数组，hash和

ek有效的数量，totaluse>=nasize的.

static int computesizes (int nums[], int *narray) {
  int i;
  int twotoi;  /* 2^i */
  int a = 0;  /* number of elements smaller than 2^i */
  int na = 0;  /* number of elements to go to array part */
  int n = 0;  /* optimal size for array part */
  for (i = 0, twotoi = 1; twotoi/2 < *narray; i++, twotoi *= 2) {
    if (nums[i] > 0) {
      a += nums[i];
      if (a > twotoi/2) {  /* more than half elements present? */
        n = twotoi;  /* optimal size (till now) */
        na = a;  /* all elements smaller than n will go to array part */
      }
    }
    if (a == *narray) break;  /* all elements already counted */
  }
  *narray = n;
  lua_assert(*narray/2 <= na && na <= *narray);
  return na;
}

此函数是根据之前nums区域的数据去求知rehash之后的数组大小。从前往后把nums的区域数据相加，看看每个2^n之中的有多少个key为number形的。

如果数量大于2^n /2= 2^n-1的话，则把rehash后的数组长度确定起来，就是n,然后数

组长度为2^n,然后能进入到数组部分的数量为na.如果a == *narry表示已经统计完。上面

lua伪代码有说,把所有nums的数量加起来就是narray,也就是之前所说的nasize.

computsize的统计方法不知道是根据啥数学原理还是经验值统计出来，就不得而知了，反正是挺有道理的。

void luaH_resize (lua_State *L, Table *t, int nasize, int nhsize) {
  int i;
  int oldasize = t->sizearray;
  int oldhsize = t->lsizenode;
  Node *nold = t->node;  /* save old hash ... */
  if (nasize > oldasize)  /* array part must grow? */
    setarrayvector(L, t, nasize);
  /* create new hash part with appropriate size */
  setnodevector(L, t, nhsize);
  if (nasize < oldasize) {  /* array part must shrink? */
    t->sizearray = nasize;
    /* re-insert elements from vanishing slice */
    for (i=nasize; i<oldasize; i++) {
      if (!ttisnil(&t->array[i]))
        luaH_setint(L, t, i + 1, &t->array[i]);
    }
    /* shrink array */
    luaM_reallocvector(L, t->array, oldasize, nasize, TValue);
  }
  /* re-insert elements from hash part */
  for (i = twoto(oldhsize) - 1; i >= 0; i--) {
    Node *old = nold+i;
    if (!ttisnil(gval(old))) {
      /* doesn't need barrier/invalidate cache, as entry was
         already present in the table */
      setobjt2t(L, luaH_set(L, t, gkey(old)), gval(old));
    }
  }
  if (!isdummy(nold))
    luaM_freearray(L,
    size_t, twoto(oldhsize))); /* free old array */
}

再看luaH_resize函数，nasize就是新数组的长度,totaluse-na，就是数组，hash,ek以number为

key的总数量-目前能进入到新数组的数量，也就是进入到hash的部分。此函数不多解释，从里面的函数

名就得知函数意途。