Lua4.0 字符串相关

最新推荐文章于 2021-02-23 16:11:20 发布

weixin_33958366

最新推荐文章于 2021-02-23 16:11:20 发布

阅读量84

点赞数

原文链接：https://my.oschina.net/xhan/blog/492931

版权

为什么80%的码农都做不了架构师？>>>

这节看一下字符串相关的：

TString 数据结构如下所示，可以看到，TString 不单用于处理字符串，还可用于处理用户自定义数据。

/*
** String headers for string table
*/
/*
** most `malloc' libraries allocate memory in blocks of 8 bytes. TSPACK
** tries to make sizeof(TString) a multiple of this granularity, to reduce
** waste of space.
*/
#define TSPACK((int)sizeof(int))
typedef struct TString {
  union {
    struct {  /* for strings */
      unsigned long hash;
      int constindex;  /* hint to reuse constants */
    } s;
    struct {  /* for userdata */
      int tag;
      void *value;
    } d;
  } u;
  size_t len;
  struct TString *nexthash;  /* chain for hash table */
  int marked;
  char str[TSPACK];   /* variable length string!! must be the last field! */
} TString;

TSPACK 注释里有说明，是为了内存对齐。

TString 有一个联合体，联合体里是两个分别用来表示字符串和用户自定义数据的结构体。

len 字符串的长度。

nexthash 字符串表中指示下一个 TString。

marked 垃圾回收标签。

str 用来保存数据。

TString 所用的这种数据结构，称为柔性数组，即数据结构后面的空间可通过一个指针引用。

在这里可根据实际需要分配足够大的内存空间给 str 数组使用。

看起来像是 str 数组可以动态的分配大小一样。

和字符串相关（也是和自定义数据相关）的文件为 lstring.h, lstring.c 。

#define sizestring(l)   ((long)sizeof(TString) + ((long)(l+1)-TSPACK)*(long)sizeof(char))

sizestring 宏为根据字符串的长度获得所需要的 TString 数据结构的内存大小。

下面看一下具体的代码

/*
** type equivalent to TString, but with maximum alignment requirements
*/
union L_UTString {
  TString ts;
  union L_Umaxalign dummy;  /* ensures maximum alignment for `local' udata */
};

保证内存对齐。

看一下初始化与清理。

void luaS_init (lua_State *L) {
  L->strt.hash = luaM_newvector(L, 1, TString *);
  L->udt.hash = luaM_newvector(L, 1, TString *);
  L->nblocks += 2*sizeof(TString *);
  L->strt.size = L->udt.size = 1;
  L->strt.nuse = L->udt.nuse = 0;
  L->strt.hash[0] = L->udt.hash[0] = NULL;
}

初始化状态机 L 的 stringtable 字符串表 strt 和用户自定义数据表 udt。

增加目前 L 的内存使用空间。

设置初始尺寸 size 为 1，已使用空间为 0。

设置 hash 指针为 NULL。

void luaS_freeall (lua_State *L) {
  LUA_ASSERT(L->strt.nuse==0, "non-empty string table");
  L->nblocks -= (L->strt.size + L->udt.size)*sizeof(TString *);
  luaM_free(L, L->strt.hash);
  LUA_ASSERT(L->udt.nuse==0, "non-empty udata table");
  luaM_free(L, L->udt.hash);
}

释放空间，与 init 相反的操作，释放时表需要为空。

TString 的分配

TString *luaS_new (lua_State *L, const char *str) {
  return luaS_newlstr(L, str, strlen(str));
}

传入一个 char* 型字符串，返回一个相应的 TString 型数据结构。

注意第三个参数就是字符串 str 的长度。

TString *luaS_newfixed (lua_State *L, const char *str) {
  TString *ts = luaS_new(L, str);
  if (ts->marked == 0) ts->marked = FIXMARK;  /* avoid GC */
  return ts;
}

调用 luaS_new ，将返回的 TString marked 设为 FIXMARK 以避免垃圾回收。

和 FIXMARK 相对应的另一个宏是 RESERVEDMARK，用于保留字。

/*
** any TString with mark>=FIXMARK is never collected.
** Marks>=RESERVEDMARK are used to identify reserved words.
*/
#define FIXMARK 2
#define RESERVEDMARK3

我们可以看到在词法分析一开 luaX_int 中，

void luaX_init (lua_State *L) {
  int i;
  for (i=0; i<NUM_RESERVED; i++) {
    TString *ts = luaS_new(L, token2string[i]);
    ts->marked = (unsigned char)(RESERVEDMARK+i);  /* reserved word */
  }
}

程序调用 luaS_new ，将返回的 TString marked 设置为 RESERVEDMARK 加上保留字序号。

在使用时，可根据这个 marked 得到相应的保留字。

luaS_new 通过调用 luaS_newlstr 来做具体的工作：

TString *luaS_newlstr (lua_State *L, const char *str, size_t l) {
  unsigned long h = hash_s(str, l);
  int h1 = h & (L->strt.size-1);
  TString *ts;
  for (ts = L->strt.hash[h1]; ts; ts = ts->nexthash) {
    if (ts->len == l && (memcmp(str, ts->str, l) == 0))
      return ts;
  }
  /* not found */
  ts = (TString *)luaM_malloc(L, sizestring(l));
  ts->marked = 0;
  ts->nexthash = NULL;
  ts->len = l;
  ts->u.s.hash = h;
  ts->u.s.constindex = 0;
  memcpy(ts->str, str, l);
  ts->str[l] = 0;  /* ending 0 */
  L->nblocks += sizestring(l);
  newentry(L, &L->strt, ts, h1);  /* insert it on table */
  return ts;
}

程序一上来先计算 str 的哈希值。计算哈希值的算法为

static unsigned long hash_s (const char *s, size_t l) {
  unsigned long h = l;  /* seed */
  size_t step = (l>>5)|1;  /* if string is too long, don't hash all its chars */
  for (; l>=step; l-=step)
    h = h ^ ((h<<5)+(h>>2)+(unsigned char)*(s++));
  return h;
}

根据算得的哈希值得到在字符串哈希表中的位置。L->strt.size 是一直是 2 的整数次幂。

int h1 = h & (L->strt.size-1); 这一句把哈希值映射到正确的下标。

for 循环在哈希表里查找指定的字符串，如果找到，则返回。

否则，添加。

设置 TString 的参数，通过 newentry 添加到哈希表中。

static void newentry (lua_State *L, stringtable *tb, TString *ts, int h) {
  ts->nexthash = tb->hash[h];  /* chain new entry */
  tb->hash[h] = ts;
  tb->nuse++;
  if (tb->nuse > (lint32)tb->size && tb->size < MAX_INT/2)  /* too crowded? */
    luaS_resize(L, tb, tb->size*2);
}

添加哈希表后，查看下哈希表是否已经使用过半。如果使用过半，需要扩容。

扩容把当前的 size 扩大一倍。这保证了哈希表的尺寸一直是 2 的整数次幂。

void luaS_resize (lua_State *L, stringtable *tb, int newsize) {
  TString **newhash = luaM_newvector(L, newsize, TString *);
  int i;
  for (i=0; i<newsize; i++) newhash[i] = NULL;
  /* rehash */
  for (i=0; i<tb->size; i++) {
    TString *p = tb->hash[i];
    while (p) {  /* for each node in the list */
      TString *next = p->nexthash;  /* save next */
      unsigned long h = (tb == &L->strt) ? p->u.s.hash : IntPoint(p->u.d.value);
      int h1 = h&(newsize-1);  /* new position */
      LUA_ASSERT(h%newsize == (h&(newsize-1)),
                    "a&(x-1) == a%x, for x power of 2");
      p->nexthash = newhash[h1];  /* chain it in new position */
      newhash[h1] = p;
      p = next;
    }
  }
  luaM_free(L, tb->hash);
  L->nblocks += (newsize - tb->size)*sizeof(TString *);
  tb->size = newsize;
  tb->hash = newhash;
}

先为新的哈希表分配空间，设置初始指针为 NULL。

把老的哈希表里的值设置到新的哈希表。

注意这一句

unsigned long h = (tb == &L->strt) ? p->u.s.hash : IntPoint(p->u.d.value);

因为 UserData 也需要使用 luaS_resize 这个函数，所以这里是为了判断传入的是哪个哈希表。

udata 相关的 luaS_createudata 和 luaS_newudata 与字符串类似，不再说明。

字符串相关的分析到此结束。

----------------------------------------

到目前为止的问题：

> 函数原型优化 luaU_optchunk

> 打印函数原型 luaU_printchunk

> dump 函数原型 luaU_dumpchunk

----------------------------------------

转载于:https://my.oschina.net/xhan/blog/492931

weixin_33958366

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫