Lua 源码学习笔记（2）字符串

最新推荐文章于 2021-04-16 16:12:21 发布

ChiLi_Lin

最新推荐文章于 2021-04-16 16:12:21 发布

阅读量245

点赞数 1

分类专栏： Lua Lua 学习笔记

本文链接：https://blog.csdn.net/l773575310/article/details/94358412

版权

Lua 同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

Lua 学习笔记

11 篇文章 0 订阅

订阅专栏

Lua 源码学习笔记（2）字符串

参考书籍：《Lua设计与实现》

作者书籍对应Github：https://github.com/lichuang/Lua-Source-Internal

lua string源码分析总结：https://blog.csdn.net/boyxiaolong/article/details/24104543

lua.5.2.3源码阅读(02)：TString字符串对象：https://www.cnblogs.com/zerozero/p/4190300.html

Lua版本：5.3.5

概述

字符串是被内化（internalization）的一种数据：存放字符串的变量，不是字符串的数据副本，而是整个字符串的引用。每创建一个字符串，都会先查内存有没有相同字符串数据，有的话直接引用，否则创建一份新的。
一旦创建字符串，内容不可变。

改变一个字符串变量不会影响原来的字符串数据：

a = "1"			-- 变量a指向字符串"1"
a = a.."2"		-- 变量a指向字符串"12"，上面的字符串"1"留着等GC

在这里插入图片描述

Lua虚拟机使用一个散列桶管理字符串。

优点

传统字符串：
- 比较：逐位对比，复杂度和字符串长度线性相关
Lua字符串：
- 比较：对比散列值即可。
- 空间优化，相同字符串只有一个副本。

字符串实现

// lobject.h

/*
** Header for string value; string bytes follow the end of this structure
** (aligned according to 'UTString'; see next).
*/
typedef struct TString {
  CommonHeader;
  lu_byte extra;  /* reserved words for short strings; "has hash" for longs */
  lu_byte shrlen;  /* length for short strings */
  unsigned int hash;
  union {
    size_t lnglen;  /* length for long strings */
    struct TString *hnext;  /* linked list for hash table */
  } u;
} TString;


/*
** Ensures that address after this type is always fully aligned.
*/
typedef union UTString {
  L_Umaxalign dummy;  /* ensures maximum alignment for strings */
  TString tsv;
} UTString;

L_Umaxalign：最大对齐量，这里是double，为了CPU读取数据时能更高性能。

// llimits.h

/* type to ensure maximum alignment */
#if defined(LUAI_USER_ALIGNMENT_T)
typedef LUAI_USER_ALIGNMENT_T L_Umaxalign;
#else
typedef union {
  lua_Number n;
  double u;
  void *s;
  lua_Integer i;
  long l;
} L_Umaxalign;
#endif

extra：标记为保留字符串，即不会GC阶段被回收，一直保留在系统，一般只有关键字这种。

// lstring.h
// 判断是否是保留字符串

/*
** test whether a string is a reserved word
*/
#define isreserved(s)	((s)->tt == LUA_TSHRSTR && (s)->extra > 0)

Hash

// lstate.h  管理字符串hash值的字符串表

/*
** 'global state', shared by all threads of this state
*/
typedef struct global_State {
  ...
  stringtable strt;  /* hash table for strings */
  ...
}


typedef struct stringtable {
  TString **hash;
  int nuse;  /* number of elements */
  int size;
} stringtable;

创建一个TString的时候，首先根据散列算法算出散列值，这就是strt数组的索引值。如果这里已经有元素，则使用链表串接起来。
当数据量非常大时，分配到每个桶上的数据也会非常多，这样一次查找也退化成了一次线性的查找过程。Lua 中也考虑了这种情况，所以有一个重新散列（ rehash ）的过程，这就是当字符串数据非常多时，会重新分配桶的数量，降低每个桶上分配到的数据数量，这个过程在函数luaS_resize中

创建字符串

// lstring.c

/*
** new string (with explicit length)
*/
TString *luaS_newlstr (lua_State *L, const char *str, size_t l) {
  if (l <= LUAI_MAXSHORTLEN)  /* short string? 见下面*/
    return internshrstr(L, str, l);
  else {
    TString *ts;
    if (l >= (MAX_SIZE - sizeof(TString))/sizeof(char))
      luaM_toobig(L);			// 太大，抛异常
    ts = luaS_createlngstrobj(L, l); 		//创建long字符串
    memcpy(getstr(ts), str, l * sizeof(char));
    return ts;
  }
}

短字符串

// llimits.h

/*
** Maximum length for short strings, that is, strings that are
** internalized. (Cannot be smaller than reserved words or tags for
** metamethods, as these strings must be internalized;
** #("function") = 8, #("__newindex") = 10.)
*/
#if !defined(LUAI_MAXSHORTLEN)
#define LUAI_MAXSHORTLEN	40
#endif


/*
** checks whether short string exists and reuses it or creates a new one
*/
static TString *internshrstr (lua_State *L, const char *str, size_t l) {
  TString *ts;
  global_State *g = G(L);
  unsigned int h = luaS_hash(str, l, g->seed);
  TString **list = &g->strt.hash[lmod(h, g->strt.size)];
  lua_assert(str != NULL);  /* otherwise 'memcmp'/'memcpy' are undefined */
  for (ts = *list; ts != NULL; ts = ts->u.hnext) {
    if (l == ts->shrlen &&
        (memcmp(str, getstr(ts), l * sizeof(char)) == 0)) {
      /* found! */
      if (isdead(g, ts))  /* dead (but not collected yet)? */
        changewhite(ts);  /* resurrect it */
      return ts;
    }
  }
    // 字符串的数量大于桶数量，且桶数量小于 MAX_INT/2，进行翻倍
  if (g->strt.nuse >= g->strt.size && g->strt.size <= MAX_INT/2) {
    luaS_resize(L, g->strt.size * 2);			// 见下面
    list = &g->strt.hash[lmod(h, g->strt.size)];  /* recompute with new size */
  }
  ts = createstrobj(L, l, LUA_TSHRSTR, h);
  memcpy(getstr(ts), str, l * sizeof(char));
  ts->shrlen = cast_byte(l);
  ts->u.hnext = *list;
  *list = ts;
  g->strt.nuse++;
  return ts;
}


/*
** resizes the string table
*/
void luaS_resize (lua_State *L, int newsize) {
  int i;
  stringtable *tb = &G(L)->strt;			// 拿到全局的字符串表
  if (newsize > tb->size) {  /* grow table if needed ，内存不够，重新申请*/
    luaM_reallocvector(L, tb->hash, tb->size, newsize, TString *);
    for (i = tb->size; i < newsize; i++)
      tb->hash[i] = NULL;
  }
  for (i = 0; i < tb->size; i++) {  /* rehash  重新计算Hash */
    TString *p = tb->hash[i];
    tb->hash[i] = NULL;
    while (p) {  /* for each node in the list ,找到Hash链表最后一个为空的，塞进去*/
      TString *hnext = p->u.hnext;  /* save next */
      unsigned int h = lmod(p->hash, newsize);  /* new position */
      p->u.hnext = tb->hash[h];  /* chain it */
      tb->hash[h] = p;
      p = hnext;
    }
  }
  if (newsize < tb->size) {  /* shrink table if needed */
    /* vanishing slice should be empty */
    lua_assert(tb->hash[newsize] == NULL && tb->hash[tb->size - 1] == NULL);
    luaM_reallocvector(L, tb->hash, tb->size, newsize, TString *);
  }
  tb->size = newsize;
}


/*
** creates a new string object
*/
static TString *createstrobj (lua_State *L, size_t l, int tag, unsigned int h) {
  TString *ts;
  GCObject *o;
  size_t totalsize;  /* total size of TString object */
  totalsize = sizelstring(l);
  o = luaC_newobj(L, tag, totalsize);
  ts = gco2ts(o);
  ts->hash = h;
  ts->extra = 0;
  getstr(ts)[l] = '\0';  /* ending 0 */
  return ts;
}


TString *luaS_createlngstrobj (lua_State *L, size_t l) {
  TString *ts = createstrobj(L, l, LUA_TLNGSTR, G(L)->seed);
  ts->u.lnglen = l;
  return ts;
}

https://www.cnblogs.com/zerozero/p/4190300.html

luaS_newlstr和luaS_new，实际两个函数只是参数不一样，是否包含’\0’结束符。

创建字符串的过程中，根据字符串的长度，进行不同的处理，长度小于LUAI_MAXSHORTLEN的字符串，

进行hash，重用放置在在一个hash表格中，对于长度大于LUAI_MAXSHORTLEN的则一定创建一个

TString对象，另外一个区别是，GC所挂在的位置不同：一个在global_State的字符串hash表中，另外

一个在global_State的所有GC列表中allgc。

ChiLi_Lin

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lua 源码学习笔记（2）字符串

Lua 源码学习笔记（2）字符串参考书籍：《Lua设计与实现》作者书籍对应Github：https://github.com/lichuang/Lua-Source-Internallua string源码分析总结：https://blog.csdn.net/boyxiaolong/article/details/24104543lua.5.2.3源码阅读(02)：TString字符串对...
复制链接

扫一扫

专栏目录