Programming in Lua, 2Nd Edition - Chapter 20: The String Library

最新推荐文章于 2010-12-02 23:31:00 发布

蓝色歌谣

最新推荐文章于 2010-12-02 23:31:00 发布

阅读量709

点赞数

分类专栏： lua 文章标签： string lua library function cgi date

本文链接：https://blog.csdn.net/HowdyHappy/article/details/6012313

版权

lua 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

Chapter 20: The String Library

20.1 Basic String Functions

string.len(s)

string.rep(s,n) -- 返回s重复n 次的string

string.rep("a",2^20) -- 返回1M 大小的string

string.lower(s)

string.sub(s,i,j)

负数索引表示方向是从后往前

s = "[in brackets]"

print(string.sub(s, 2, -2)) --> 从第二个字符开始，到倒数第二个字符结束

string.char 和string.byte 函数在字符和它们的内部数字表示间转换

print(string.char(97)) --> a

i = 99; print(string.char(i, i+1, i+2)) --> cde

print(string.byte("abc")) --> 97

print(string.byte("abc", 2)) --> 98

print(string.byte("abc", -1)) --> 99

print(string.byte("abc", 1, 3)) --> 97 98 99

Lua 中的习惯用法

s = "abc"

t = {s:byte(1,-1)}

for i, v in ipairs(t) do

print(v) -- 97 98 99

end

print(string.char(unpack(t))) --> abc

print(string.byte("abc", 1, 2)) --> 97 98

不幸的是，这种技术不能工作于长串（大于2K），因为Lua 限制了函数返回值的长度。

string.format

print(string.format("pi = %.4f", math.pi)) --> pi = 3.1416

d = 5; m = 11; y = 1990

print(string.format("%02d/%02d/%04d", d, m, y)) --> 05/11/1990

tag, title = "h1", "a title"

print(string.format("<%s>%s</%s>", tag, title, tag))

--> <h1>a title</h1>

20.2 Pattern-Matching Functions

string 库中最强大的函数是find, match, gsub（全局替换） 和gmatch（全局匹配）。他们都基于模式（patterns）

string.find

s = "hello world"

i, j = string.find(s, "hello")

print(i, j) --> 1 5

print(string.sub(s, i, j)) --> hello

print(string.find(s, "world")) --> 7 11

i, j = string.find(s, "l")

print(i, j) --> 3 3

print(string.find(s, "lll")) --> nil

输出串中所有出现”/n” 位置

s = "hello/n world/n"

local t = {} -- table to store the indices

local i = 0

while true do

i = string.find(s, "/n", i+1) -- find next newline

if i == nil then break end

t[#t + 1] = i

end

for i, v in ipairs(t) do

print(v)

end

后面将会看到一种更简单的方法写这种循环，使用全局匹配迭代器string.gmatch。

string.match

string.match 类似于string.find，不同的是string.find 返回位置，而string.match 返回匹配的串。

print(string.match("hello world", "hello")) --> hello

对于像`hello' 这种固定模式，string.match 毫无价值。它的真正强大之处在于使用可变模式：

date = "Today is 17/7/1990"

d = string.match(date, "%d+/%d+/%d+")

print(d) --> 17/7/1990

后面我们会讨论模式`%d+/%d+/%d+' 的意义以及string.match 的更多高级用法。

string.gsub

全局替换

s = string.gsub("Lua is cute", "cute", "great")

print(s) --> Lua is great

s = string.gsub("all lii", "l", "x")

print(s) --> axx xii

s = string.gsub("Lua is great", "Sol", "Sun")

print(s) --> Lua is great

第四个可选参数限制了可被替换的个数：

s = string.gsub("all lii", "l", "x", 1)

print(s) --> axl lii

s = string.gsub("all lii", "l", "x", 2)

print(s) --> axx lii

输出串有多少个空格

str = "this string have some spaces"

count = select(2, string.gsub(str, " ", " ")) -- select 函数在两个返回值中选择第二个

print(count)

string.gsub 的第二个返回值是替换的次数。

string.gmatch

输出串中的所有单词

s = "ta3st$rin%%$g6__...ha..11ve some spaces"

words = {}

for w in string.gmatch(s, "%a+") do

words[#words + 1] = w

end

for i, v in ipairs(words) do

print(v)

end

模似require 搜索模块的过程

function search (modname, path)

modname = string.gsub(modname, "%.", "/")

for c in string.gmatch(path, "[^;]+") do

local fname = string.gsub(c, "?", modname)

local f = io.open(fname)

if f then

f:close()

return fname

end

return nil -- not found

end

20.3 Patterns

s = "Deadline is 30/05/1999, firm"

date = "%d%d/%d%d/%d%d%d%d"

print(string.sub(s, string.find(s, date))) --> 30/05/1999

. all characters 任意字符

%a letters

%c control characters

%d digits

%l lower-case letters

%p punctuation characters

%s space characters

%u upper-case letters

%w alphanumeric characters

%x hexadecimal digits

%z the character whose representation is 0

大写版表示反义，比如%A 表示所有非字母

print(string.gsub("hello, up-down!", "%A", "."))

--> hello..up.down. 4

（4 是gsub 的第二返回值表示替换数）

替换字母数字下划线

print(string.gsub("@@abc123___?", "[%w_]", "."))

--> @@.........? 9

替换数字0 和1

print(string.gsub("abc0111000012345", "[01]", "."))

--> abc.........2345 9

替换“[” 和“]”

print(string.gsub("[[]]][[]]]][", "[%[%]]", "."))

--> ............ 12

计算元音字母有几个

text = "To count the number of vowels in a text, you can write"

nvow = select(2, string.gsub(text, "[AEIOUaeiou]", ""))

print(nvow)

匹配0 到7 ：[0-7]

匹配非0 到7 ：[^0-7]

匹配非换行符：[^/n]

%S 的另一种写法：[^%s]

因为current locale set，`[a-z]' 可能不同于`%l'

重复次数

+ 1 or more repetitions 1 或多

* 0 or more repetitions 0 或多

- also 0 or more repetitions 0 或多

? optional (0 or 1 occurrence) 0 或1

匹配1 或多个字母

print(string.gsub("one, and two; and three", "%a+", "word"))

--> word, word word; word word

匹配1 或多个数字

print(string.match("the number 1298 is even", "%d+")) --> 1298

匹配“（”+ “空格” + “）”

print(string.match("( )", "%(%s*%)")) --> ( )

匹配字母或下划线开头 + 数字或字母或下划线

print(string.match("_3sdwe", "[_%a][_%w]*")) --> _3sdwe

“*”和“-”的区别是，“-”总是试图最短匹配

print(string.match("_3sdwe", "[_%w]-")) --> [_%w]- 这里总是匹配空串

匹配c 语言中的注释

test = "int x; /* x */ int y; /* y */"

print(string.gsub(test, "/%*.*%*/", "<COMMENT>"))

--> int x; <COMMENT>

因为“.*”总是试图进行最大匹配，所第一个“/*” 与最后一个“*/” 配对了，而实际上原文有两个注释而不是一个。

正确的做法：

test = "int x; /* x */ int y; /* y */"

print(string.gsub(test, "/%*.-%*/", "<COMMENT>"))

--> int x; <COMMENT> int y; <COMMENT>

因为“.-”总是试图进行最小匹配，所第一个“/*” 与第一个“*/” 配对了。

0 或1 个“-”或“+” + 1 个或多个数字

test = "-12 23 +1009"

print(string.gsub(test, "[+-]?%d+", "<digit>"))

字符串是否以数字开头

s = "123-fdf"

if string.find(s, "^%d") then

print(s)

end

字符串是否以数字结尾

s = "-fdf123"

if string.find(s, "%d$") then

print(s)

end

以”+-”开头，以数字结尾

s = "+123"

if string.find(s, "^[+-]?%d+$") then

print(s)

end

以特定字符闭合的串

s = "a (enclosed (in) parentheses) line"

print(string.gsub(s, "%b()", "")) --> a line

s2 = "a {enclosed (in) parentheses} line"

print(string.gsub(s2, "%b{}", "")) --> a line

s2 = "a /enclosed (in) parentheses/ line"

print(string.gsub(s2, "%b//", "")) --> a line

20.4 Captures

要捕获的东西加括号“()”

pair = "name = Anna"

key, value = string.match(pair, "(%a+)%s*=%s*(%a+)")

print(key, value) --> name Anna

date = "Today is 17/7/1990"

d, m, y = string.match(date, "(%d+)/(%d+)/(%d+)")

print(d, m, y) --> 17 7 1990

匹配“"”或“’” + 任意字符 +“"”或“’”

s = [[123"abc"]]

a, b = string.match(s, "[/"'].-[/"']")

print(a,b)

上面的代码匹配“it's all right”失败

先看()和%1 的作用

s = [[123"abc"]]

a, b = string.match(s, "([/"'])(.-)%1")

print(a,b)

匹配“”it's all right””的正解代码

s = [["it's all right"]]

a, b = string.match(s, "([/"'])(.-)%1")

print(a,b)

模式里面的“%1”会被第一个捕获替换

更复杂的例子

p = "%[(=*)%[(.-)%]%1%]"

s = "a = [=[[[ something ]] ]==] ]=]; print(a)"

q, quotedPart = string.match(s, p)

print(q, quotedPart) --> = [[ something ]] ]==]

第一个捕获 + “-” + 第一个捕获

print(string.gsub("hello Lua!", "%a", "%1-%1"))

--> h-he-el-ll-lo-o L-Lu-ua-a!

第二个捕获 + 第二个捕获

print(string.gsub("hello Lua", "(.)(.)", "%2%1")) --> ehll ouLa

第一个捕获替换成第一个捕获

s = "//command{some text}"

a = string.gsub(s, "(//%a+{.-})", "%1")

print(a)

LaTeX 命令换成XML 命令

s = [[the /quote{task} is to /em{change} that.]]

a = string.gsub(s, "//(%a+){(.-)}", "<%1>%2</%1>")

print(a)

去除字符串的前后空格

s = " a b c "

function trim (s)

return (string.gsub(s, "^%s*(.-)%s*$", "%1"))

end

print(trim(s))

20.5 Replacements

string.gsub 的第三个参数可以是一个表或一个函数。如果是函数，则gsub 每找到一个匹配就调用一次这个函数，并使用捕获作为参数，然后函数的返回值做为gsub 的替换string。如果是一个表，gsub 使用第一个捕获作为key 查找表，得到的value 作为gsub 的替换string。

用表作为gsub 的第三个参数

function expand (s)

return (string.gsub(s, "$(%w+)", _G))

end

name = "Lua"; status = "great"

print(expand("$name is $status, isn't it?"))

--> Lua is great, isn't it?

print(expand("$othername is $status, isn't it?"))

--> $othername is great, isn't it?

othername 不在_G 表里面，所以没有被替换。

用函数作为gsub 的第三个参数

function expand (s)

return (string.gsub(s, "$(%w+)", function (n)

return tostring(_G[n])

end))

end

print(expand("print = $print; a = $a"))

--> print = function: 0x8050ce0; a = nil

如果函数返回nil 就不会发生替换，但这个例子不会发生这种情况，因为tostring 不会返回nil

允许嵌套LaTeX 命令换成XML 命令

function toxml (s)

s = string.gsub(s, "//(%a+)(%b{})", function (tag, body)

body = string.sub(body, 2, -2) -- remove brackets

body = toxml(body) -- handle nested commands

return string.format("<%s>%s</%s>", tag, body, tag)

end

)

return s

end

print(toxml("//title{The //bold{big} example}"))

--> <title>The <bold>big</bold> example</title>

如果gsub 没有匹配则直接返回原串，而且作为第三个参数的函数不会被调用。

URL encoding

url 编码把特殊字符转换成“%xx” 的形式，其中xx 为字符的十六进制表示；并且，把空格转换成“+”。例如，“a+b = c”被编码为“a%2Bb+%3D+c”。

function escape (s)

s = string.gsub(s, "[&=+%%%c]", function (c)

return string.format("%%%02X", string.byte(c))

end)

s = string.gsub(s, " ", "+")

return s

end

function encode (t)

local b = {}

for k,v in pairs(t) do

b[#b + 1] = (escape(k) .. "=" .. escape(v))

end

return table.concat(b, "&")

end

t = {name = "al", query = "a+b = c", q = "yes or no"}

print(encode(t)) --> q=yes+or+no&name=al&query=a%2Bb+%3D+c

function unescape (s)

s = string.gsub(s, "+", " ")

s = string.gsub(s, "%%(%x%x)", function (h)

return string.char(tonumber(h, 16))

end)

return s

end

print(unescape("a%2Bb+%3D+c")) --> a+b = c

cgi = {}

function decode (s)

for name, value in string.gmatch(s, "([^&=]+)=([^&=]+)") do

name = unescape(name)

value = unescape(value)

cgi[name] = value

end

decode("q=yes+or+no&name=al&query=a%2Bb+%3D+c")

for i in pairs(cgi) do

print(i, cgi[i])

end

---------

gmatch 返回的是迭代器，“^”表示“不是”。看下面例子：

for name in string.gmatch("q=yes+or+no&name=al&query=a%2Bb+%3D+c", "[^q]+c") do

print(name)

end

--> uery=a%2Bb+%3D+c

"[^q]+c" 表示不是“q”+ 一个字符“c”

function decode (s)

for name, value in string.gmatch(s, "([^&=]+)=([^&=]+)") do

print(name, value)

end

decode("q=yes+or+no&name=al&query=a%2Bb+%3D+c")

-->q yes+or+no

-->name al

-->query a%2Bb+%3D+c

"([^&=]+)=([^&=]+)" 表示不是“&”或“=”+ 一个字符“=”+不是“^”或“&”

Tab expansion

空捕获“()”：捕获以后不要捕获的东西，而要捕获的位置。

一个空捕获“()”在lua 中的特殊的意义。这个模式捕获它在字串中的位置：

print(string.match("hello", "()ll()")) --> 3 5

第二个空捕获是在match 的后面。

空捕获的一个漂亮例子是在strings 中展开tabs：

把strings 中的Tab 转换成若干个空格

function expandTabs (s, tab)

tab = tab or 8 -- tab "size" (default is 8)

local corr = 0

s = string.gsub(s, "()/t", function (p)

--local sp = tab - (p - 1 + corr)%tab

--corr = corr - 1 + sp

return string.rep(" ",tab)

end)

return s

end

s = "a b c"; -- 这个串的字母中间是用TAB 间格的

-- 把Tab 转拘成1 到8 个空格

print(expandTabs(s, 1))

print(expandTabs(s, 2))

print(expandTabs(s, 3))

print(expandTabs(s, 4))

print(expandTabs(s, 5))

print(expandTabs(s, 6))

print(expandTabs(s, 7))

print(expandTabs(s, 8))

function unexpandTabs (s, tab)

tab = tab or 8

s = expandTabs(s)

local pat = string.rep(".", tab)

s = string.gsub(s, pat, "%0/1")

s = string.gsub(s, " +/1", "/t")

s = string.gsub(s, "/1", "")

return s

end

s4 = "a b c"

print(unexpandTabs(s4, 4))

（原例子没看懂，就改了）

20.6 Tricks of the Trade

模式匹配的使用必须十分小心。模式匹配不是解析器的替代器。其产品难以构建出工业级质量的解析器。

匹配C 注释的错误例子：

test = [[char s[] = "a /* here"; /* a tricky string */]]

print(string.gsub(test, "/%*.-%*/", "<COMMENT>"))

--> char s[] = "a <COMMENT>

“(.-)%$” 和 “^(.-)%$”，前面一个有严重性能问题，后面一个就没有。

这个例子成功的在串的开头匹配了一个空串：

i, j = string.find(";$% **#$hello13", "%a*")

print(i,j) --> 1 0

利用lua 代码来构建需要重复写好多次的模式：

pattern = string.rep("[^/n]", 70) .. "[^/n]*"

将模式转换成大小写无关

function nocase (s)

s = string.gsub(s, "%a", function (c)

return "[" .. string.lower(c) .. string.upper(c) .. "]"

end)

return s

end

print(nocase("Hi there!")) --> [hH][iI] [tT][hH][eE][rR][eE]!

另一个模式匹配的有用例子是在进行实际工作之前对string 进行预处理。

follows a typical string: "This is /"great/"!".

例始我们想把“/"”编码为“/1”。但是，如果原文本中已经含有“/1”，我们就会有麻烦。一种简单且能避免问题的方法是将所有“/x”编码为“/ddd”，其中ddd 是字符x 的数字表示：

将string 中的转义字符编码成“/ddd”

function code (s)

return (string.gsub(s, "//(.)", function (x)

return string.format("//%03d", string.byte(x))

end))

end

function decode (s)

return (string.gsub(s, "//(%d%d%d)", function (d)

return "//" .. string.char(d)

end))

end

s = [[follows a typical string: "This is /"great/"!".]]

s = code(s)

s = string.gsub(s, '".-"', string.upper)

s = decode(s)

print(s) --> follows a typical string: "THIS IS /"GREAT/"!".

print(code([[follows a typical string: "This is /"great/"!".]]))

print(decode([[follows a typical string: "This is /034great/034!"]]))

print(string.gsub([[follows a typical string: "This is /"great/"!".]], '".-"', string.upper))

蓝色歌谣

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Programming in Lua, 2Nd Edition - Chapter 20: The String Library

 Chapter 20: The String Library 20.1 Basic String Functions string.len(s) string.rep(s,n) -- 返回s重复n 次的string string.rep("a",2^20) -- 返回1M大小的string string.lower(s) 
复制链接

扫一扫