本文只讨论lua #取表长度。
官方解释:
3.4.6 – The Length Operator
The length operator is denoted by the unary prefix operator #
. The length of a string is its number of bytes (that is, the usual meaning of string length when each character is one byte).
A program can modify the behavior of the length operator for any value but strings through the __len
metamethod (see §2.4).
Unless a __len
metamethod is given, the length of a table t
is only defined if the table is a sequence, that is, the set of its positive numeric keys is equal to{1..n} for some non-negative integer n. In that case, n is its length. Note that a table like
{10, 20, nil, 40}
is not a sequence, because it has the key 4
but does not have the key 3
. (So, there is no n such that the set {1..n} is equal to the set of positive numeric keys of that table.) Note, however, that non-numeric keys do not interfere with whether a table is a sequence.
再看看云风大神的翻译:
取长度操作符写作一元前置符 #
。 字符串的长度是它的字节数(就是以一个字符一个字节计算的字符串长度)。
程序可以通过 __len
元方法(参见 §2.4) 来修改对字符串类型外的任何值的取长度操作行为。
如果 __len
元方法没有给出, 表 t
的长度只在表是一个 序列 时有定义。 序列指表的正数键集等于 {1..n} , 其中 n 是一个非负整数。 在这种情况下,n 是表的长度。 注意这样的表
{10, 20, nil, 40}
不是一个序列,因为它有键 4
却没有键 3
。 (因此,该表的正整数键集不等于 {1..n} 集合,故而就不存在 n。) 注意,一张表是否是一个序列和它的非数字键无关。
我的理解:”表 t 的长度只在表是一个 序列 时有定义。序列指表的正数键集等于 {1..n} , 其中 n 是一个非负整数。" ,假设当前表中最大非负整数键为n,只有从键1到n所有键对应的值都不为nil,使用#获取表长度才是正确的;只要中间有一个键对应的value为nil,这种操作得到的结果就是不可信的。所以在使用这个操作符时,要确定当前表是不是符合这种定义,如果不符还是小心为好。
下面看源码实现:
lua5.1和lua5.2对这个操作的实现一样(源码一样)。先贴源码:
static int unbound_search (Table *t, unsigned int j) {
unsigned int i = j; /* i is zero or a present index */
j++;
/* find `i' and `j' such that i is present and j is not */
while (!ttisnil(luaH_getint(t, j))) {
i = j;
j *= 2;
if (j > cast(unsigned int, MAX_INT)) { /* overflow? */
/* table was built with bad purposes: resort to linear search */
i = 1;
while (!ttisnil(luaH_getint(t, i))) i++;
return i - 1;
}
}
/* now do a binary search between them */
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(luaH_getint(t, m))) j = m;
else i = m;
}
return i;
}
/*
** Try to find a boundary in table `t'. A `boundary' is an integer index
** such that t[i] is non-nil and t[i+1] is nil (and 0 if t[1] is nil).
*/
int luaH_getn (Table *t) {
unsigned int j = t->sizearray;
if (j > 0 && ttisnil(&t->array[j - 1])) {
/* there is a boundary in the array part: (binary) search for it */
unsigned int i = 0;
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(&t->array[m - 1])) j = m;
else i = m;
}
return i;
}
/* else must find a boundary in hash part */
else if (isdummy(t->node)) /* hash part is empty? */
return j; /* that is easy... */
else return unbound_search(t, j);
}
lua5.3的源码只有一点小改动:
static int unbound_search (Table *t, unsigned int j) {
unsigned int i = j; /* i is zero or a present index */
j++;
/* find 'i' and 'j' such that i is present and j is not */
while (!ttisnil(luaH_getint(t, j))) {
<span style="color:#ff0000;"> i = j;
if (j > cast(unsigned int, MAX_INT)/2) { /* overflow? */
/* table was built with bad purposes: resort to linear search */
i = 1;
while (!ttisnil(luaH_getint(t, i))) i++;
return i - 1;
}
j *= 2;</span>
}
/* now do a binary search between them */
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(luaH_getint(t, m))) j = m;
else i = m;
}
return i;
}
/*
** Try to find a boundary in table 't'. A 'boundary' is an integer index
** such that t[i] is non-nil and t[i+1] is nil (and 0 if t[1] is nil).
*/
int luaH_getn (Table *t) {
unsigned int j = t->sizearray;
if (j > 0 && ttisnil(&t->array[j - 1])) {
/* there is a boundary in the array part: (binary) search for it */
unsigned int i = 0;
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(&t->array[m - 1])) j = m;
else i = m;
}
return i;
}
/* else must find a boundary in hash part */
else if (isdummy(t->node)) /* hash part is empty? */
return j; /* that is easy... */
else return unbound_search(t, j);
}
5.3只是在unbound_search中有点小改动,思想都一样的,源码也比较简单。从源码来看,if 第一部分和第三部分unbound_search使用的都是二分查找的思路。二分查找的条件就是序列是有序,要不查的结果也不可信(像在一堆整数里查看是否有某个值,如果使用二分查找的话,前提就要先排序,然后查找,不先排序的话,找的结果不可信)。序也就是前提条件,这里取table的长度的前提条件就是,从键1到n(n为当前表非负整数key中最大值)所有键对应的值都不为nil ,否则二分查找的前提条件不满足,取长度的结果就不可信。
注意:如果表中还有非整数key-value,#取的结果只是整数部分长度。还有如表tt={1,2,3} tt[1000]=8; tt[2]=nil 后面再进行取长度操作也是不可信的。
适用情况:像我之前项目中一些临时表,初化元素时没有指定key(默认从1开始),如tt={100,200,300,{1,2},"hello world"} ,后面添加用table.insert,没有删除和其他添加元素方式,这种情况使用#取长度就没什么问题。