Lua语言中字符串String实现原理

本文来自 Lua Programming Gems   图书的第二节。 由Lua语言作者编写的Lua Performance Tips 节选而来。


From Lua Performance Tips  Roberto Ierusalimschy


From Lua Performance Tips  Roberto Ierusalimschy
About strings
As with tables, it is good to know how Lua implements strings to use them more efficiently.
The way Lua implements strings differs in two important ways from what is done in most other scripting languages.
First, all strings in Lua are internalized;this means that Lua keeps a single copy of any string. 
Whenever a new string appears, Lua checks whether it already has a copy of that string and, if so, reuses that copy. 
Internalization makes operations like string comparison and table indexing very fast, but it slows down string creation.
    在Lua语言中实现字符串和其他大部分脚本语言主要有两个不同的地方。
第一,在Lua语言中所有的字符串都被内部化。这样以为这Lua语言保存了对所有字符串的一个拷贝。
当一个新字符串出现时,Lua语言将会检查检查字符串是否已经存在,如果存在,将重复利用拷贝的字符串。
内部化字符串使字符串的比较、索引非常快速,但是当创建字符串的时候就相对较慢。
Second, variables in Lua never hold strings, but only references to them.
This implementation speeds up several string manipulations.
For instance, in Perl, when you write something like $x = $y, where $y contains a string, 
the assignment copies the string contents from the $y buffer into the $x buffer.
If the string is long, this becomes an expensive operation. 
In Lua, this assignment involves only copying a pointer to the actual string.
    第二,在Lua语言中变量从来不包含字符串本身,而是仅仅引用字符串。这个实现将提高字符串的几个操作。
例如:在Perl语言中,当写 $x = $y时,并且 $y 包含一个字符串,这个赋值将把$y buffer拷贝到$x buffer中。
如果字符串比较长,拷贝操作将非常耗时间。在Lua语言中,字符串赋值仅仅执行拷贝指向真正字符串的指针。
This implementation with references, however, slows down a particular form of string concatenation. 
In Perl, the operations [$s = $s . "x" ]and[$s .= "x" ]are quite different. 
In the first one, you get a copy of $s and adds "x" to its end.
In the second one, the "x" is simply appended to the internal buffer kept by the $s variable. 
So, the second form is independent from the string size (assuming the buffer has space for the extra text). 
if you have these commands inside loops, their difference is the difference between a linear and a quadratic algorithm.
For instance, the next loop takes almost five minutes to read a 5MByte file:
$x = "";
while (<>) 
{
$x = $x . $_;
}
If we change $x = $x . $_ to $x .= $_, this time goes down to 0.1 seconds!
    使用引用来实现字符串会使在字符串拼接时速度降低。
在Perl语言中,字符串连接 [$s = $s . "x" ] 和[$s .= "x" ]方式完全不同。
[$s = $s . "x" ] 将拷贝$s 和添加字符串“x”到$s的最后。[$s .= "x" ] 仅仅$s变量内部buffer的尾部附加字符串“x”.
所以,[$s .= "x" ] 依赖于字符串的大小。(假设字符串有足够的空间来容纳额外的文本)
例如:
$x = "";
while (<>) 
{
$x = $x . $_;
}
使用[$s = $s . "x" ]  第一种操作读取5M 文件,将花费5分钟时间,通过拷贝$x和把$_附加到最后。随着循环的进行,$x的长度会随之增加,拷贝将浪费大量时间。
如果使用$s .= "x" ]   第二种操作读物文件,时间将降低到 0.1 秒。
Lua does not offer the second, faster option, because its variables do not have buffers associated to them.
So, we must use an explicit buffer: a table with the string pieces does the job.
The next loop reads that same 5MByte file in 0.28 seconds. Not as fast as Perl, but quite good.
local t = {}
for line in io.lines() do
t[#t + 1] = line
end
s = table.concat(t, "\n")
    在Lua语言中没有提供类似于Perl中[$s .= "x" ] 方式,因为在Lua中字符串变量中没有分配缓冲区。
所以,必须使用一个额外的缓冲区:Table数据结构承担了缓冲区的责任。
使用Table作为缓冲区来读取5M 大小的文件将使用0.28  秒,虽然不如Perl脚本语言快速,但是也非常好。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值