chapter 21 The String Library

最新推荐文章于 2019-04-29 21:46:07 发布

wanglang3081

最新推荐文章于 2019-04-29 21:46:07 发布

阅读量1.2k

点赞数

分类专栏： Lua

Lua 专栏收录该内容

45 篇文章 0 订阅

订阅专栏

The power of a raw Lua interpreter to manipulate strings is quite limited. A
program can create string literals, concatenate them, and get string lengths.
But it cannot extract substrings or examine their contents. The full power to

manipulate strings in Lua comes from its string library.

The string library exports its functions as a module called string. Since
Lua 5.1, it also exports its functions as methods of the string type (using the
metatable of that type). So, for instance, to translate a string to upper case we

can write either string.upper(s) or s:upper(). Pick your choice.

21.1 Basic String Functions

string.len(s) equivalent to #s

string.rep(s,n) (or s:rep(n)) returns the string s repeated n times. You can create a string with
1 MB (e.g., for tests) with string.rep("a",2^20).

string.lower(s) string.upper(s)

As a typical use, if you want to sort an array of strings regardless of case, you may write something like this:
table.sort(a, function (a, b)
return a:lower() < b:lower()
end)

The call string.sub(s,i,j) extracts a piece of the string s, from the i-th to
the j-th character inclusive. In Lua, the first character of a string has index 1.

a="123456789";
print(a:sub(2,5)); -- 2345 (note:inclusive)

You can also use negative indices, which count from the end of the string: the
index -1 refers to the last character in a string, -2 to the previous one, and
so on. Therefore,

the call string.sub(s,1,j) (or s:sub(1,j)) gets a prefix of the string s with length j;

string.sub(s,j,-1) (or simply s:sub(j)) get a subfix from jth to the end

string.sub(s,2,-2) returns a copy of the string s with the first and last characters removed:

s = "[in brackets]"
print(s:sub(2, -2)) --> in brackets

Remember that strings in Lua are immutable. Function string.sub, like
any other function in Lua, does not change the value of a string, but returns a
new string.

s = s:sub(2, -2) //we need to reassign if you want to use the original variable.

The string.char and string.byte functions convert between characters and
their internal numeric representations.

print(string.char(97)) --> a
i = 99; print(string.char(i, i+1, i+2)) --> cde
print(string.byte("abc")) --> 97 --string.byte(s,i) ,i not assign, default is 1st of s, s[1]
print(string.byte("abc", 2)) --> 98
print(string.byte("abc", -1)) --> 99

print(string.byte("abc", 1, 2)) --> 97 98 -- from Lua 5.1

A nice idiom is {s:byte(1,-1)}, which creates a table with the codes of all characters in s. Given this table, we can recreate the original string by calling string.char(table.unpack(t)). This technique does not work for very
long strings (say, longer than 1 MB), because Lua puts a limit on how many
values a function can return.

A directive is the character ‘%’ plus a letter that tells how to format the argument:

‘d’ for a decimal number, ‘x’ for hexadecimal, ‘o’ for octal, ‘f’ for a floating-point number, ‘s’ for strings, plus
some other variants. Between the ‘%’ and the letter, a directive can include other
options that control the details of the formatting, such as the number of decimal
digits of a floating-point number:

print(string.format("pi = %.4f", math.pi)) --> pi = 3.1416
d = 5; m = 11; y = 1990
print(string.format("%02d/%02d/%04d", d, m, y)) --> 05/11/1990
tag, title = "h1", "a title"
print(string.format("<%s>%s</%s>", tag, title, tag))
--> <h1>a title</h1>

21.2 Pattern-Matching Functions

The most powerful functions in the string library are find, match, gsub (Global
Substitution), and gmatch (Global Match). They all are based on patterns.

Unlike several other scripting languages, Lua uses neither POSIX regex nor
Perl regular expressions for pattern matching. The main reason for this decision
is size: a typical implementation of POSIX regular expressions takes more than
4000 lines of code. This is about the size of all Lua standard libraries together.
In comparison, the implementation of pattern matching in Lua has less than
600 lines. Of course, pattern matching in Lua cannot do all that a full POSIX
implementation does. Nevertheless, pattern matching in Lua is a powerful tool,
and includes some features that are difficult to match with standard POSIX
implementations.

The string.find function

The string.find function searches for a pattern inside a given subject string.
The simplest form of a pattern is a word, which matches only a copy of itself.
For instance, the pattern ‘hello’ will search for the substring “hello” inside the
subject string. When find finds its pattern, it returns two values: the index

where the match begins and the index where the match ends. If it does not find
a match, it returns nil:

s="hello world hello world hello world hello adfsd asf hello";
i,j=string.find(s,"hello");
print(i,j,string.sub(s,i,j));

local index,endIndex=string.find(s,"hello");
print("use the thrid optimal parameter...");
while index do
    str=s:sub(index,endIndex);
   print(index,endIndex,str);
   index,endIndex=string.find(s,"hello",endIndex);
end

1       5       hello
use the thrid optimal parameter...
1       5       hello
13      17      hello
25      29      hello
37      41      hello
53      57      hello

The string.find function has an optional third parameter: an index that
tells where in the subject string to start the search. This parameter is useful
when we want to process all the indices where a given pattern appears

We will see later a simpler way to write such loops, using the string.gmatch
iterator.

The string.match function

instead of returning the positions
where it found the pattern, it returns the part of the subject string that matched
the pattern:

print(string.match("hello world", "hello")) --> hello

date = "Today is 17/7/1990"
d = string.match(date, "%d+/%d+/%d+")
print(d) --> 17/7/1990

The string.gsub function

The string.gsub function has three mandatory parameters:a subject string, a
pattern, and a replacement string. Its basic use is to substitute the replacement
string for all occurrences of the pattern inside the subject string:

s = string.gsub("Lua is cute", "cute", "great")
print(s) --> Lua is great

s = string.gsub("all lii", "l", "x")
print(s) --> axx xii

s = string.gsub("Lua is great", "Sol", "Sun")
print(s) --> Lua is great

An optional fourth parameter limits the number of substitutions to be made:
s = string.gsub("all lii", "l", "x", 1)
print(s) --> axl lii

s = string.gsub("all lii", "l", "x", 2)
print(s) --> axx lii

The string.gsub function also returns as a second result the number of times
it made the substitution. For instance, an easy way to count the number of
spaces in a string is
count = select(2, string.gsub(str, " ", " ")) --select 意思是选择第二个返回值assign to count ,because gsub will return

2 result, first is the substitute string,the second is the substitution count.

The string.gmatch function

The string.gmatch function returns a function that iterates over all occurrences
of a pattern in a string. For instance, the following example collects all words in
a given string s:
words = {}
for w in string.gmatch(s, "%a+") do
words[#words + 1] = w
end

As we will discuss shortly, the pattern ‘%a+’ matches sequences of one or more
alphabetic characters (that is, words).

21.3 Patterns

You can make patterns more useful with character classes. A character class is
an item in a pattern that can match any character in a specific set. For instance,
the class %d matches any digit. Therefore, you can search for a date in the format
dd/mm/yyyy with the pattern ‘%d%d/%d%d/%d%d%d%d’:
s = "Deadline is 30/05/1999, firm"
date = "%d%d/%d%d/%d%d%d%d"
print(string.sub(s, string.find(s, date))) --> 30/05/1999

The following table lists all character classes:
. all characters
%a letters
%c control characters
%d digits
%g printable characters except spaces
%l lower-case letters
%p punctuation characters
%s space characters Not a string
%u upper-case letters
%w alphanumeric characters
%x hexadecimal digits

An upper-case version of any of these classes represents the complement of the
class. For instance, ‘%A’ represents all non-letter characters:
print(string.gsub("hello, up-down!", "%A", "."))
--> hello..up.down. 4 -- 4 means substitute 4 times

Some characters, called magic characters, have special meanings when used
in a pattern. The magic characters are
( ) . % + - * ? [ ] ^ $

The character ‘%’ works as an escape for these magic characters (而不时escape character class, ie %a not excape a, it represent the letter). So, ‘%.’ matches
a dot; ‘%%’ matches the character ‘%’ itself. You can use the escape ‘%’ not only
for the magic characters, but also for any non-alphanumeric character. When in
doubt, play safe and use an escape. 也就是如果不确定是否需要escape 是最好加上也是不会有错的，

b="c:\\treovr\\ab'c#.\\hkik\\."
print(string.match(b,"%a+'.%#")) -- # 不sure 是否是magic characters, add %# 也就是# 而不时%#

---print-----> ab'c#

For the Lua parser, patterns are regular strings. They have no special
treatment, following the same rules as other strings. Only the pattern functions
interpret them as patterns, and only these functions treat the ‘%’ as an escape.
To put a quote inside a pattern, you use the same techniques that you use to put
a quote inside other strings; for instance, you can escape the quote with a ‘\’,
which is the escape character for Lua. 大赞这个特性，，类似java,javascript ,还有专门的类，，

b="c:\\treovr\\ab'c.\\hkik\\." ,,
print(string.match(b,"%a+'")) --ab'

A char-set allows you to create your own character classes, grouping different
classes and single characters inside square brackets.

For instance, the charset ‘[%w_]’ matches both alphanumeric characters and underscores;

the char-set ‘[01]’ matches binary digits;

and the char-set ‘[%[%]]’ matches square brackets.

To count the number of vowels in a text, you can write
nvow = select(2, string.gsub(text, "[AEIOUaeiou]", "")) 也就是元音字符AEIOUaeiou all substitue to blank char

You can also include character ranges in a char-set, by writing the first and the
last characters of the range separated by a hyphen. I seldom use this facility,
because most useful ranges are already predefined;

for instance, ‘[0-9]’ is the same as ‘%d’, and

‘[0-9a-fA-F]’ is the same as ‘%x’.

However, if you need to find an octal digit, then you may prefer ‘[0-7]’ instead of an explicit enumeration
like ‘[01234567]’. You can get the complement of any char-set by starting it
with ‘^’: the pattern ‘[^0-7]’ finds any character that is not an octal digit and
‘[^\n]’ matches any character different from newline. But remember that you
can negate simple classes with its upper-case version: ‘%S’ is simpler than ‘[^%s]’.

You can make patterns still more useful with modifiers for repetitions and
optional parts. Patterns in Lua offer four modifiers:
+ 1 or more repetitions
* 0 or more repetitions
- 0 or more lazy repetitions
? optional (0 or 1 occurrence)

For instance, the
pattern ‘%a+’ means one or more letters, or a word:
print(string.gsub("one, and two; and three", "%a+", "word"))
--> word, word word; word word

The pattern ‘%d+’ matches one or more digits (an integer numeral):
print(string.match("the number 1298 is even", "%d+")) --> 1298 如果没有 + 只能找到1

For instance, to match an empty parenthesis pair, such as () or ( ),
you can use the pattern ‘%(%s*%)’: the pattern ‘%s*’ matches zero or more spaces.
(Parentheses also have a special meaning in a pattern, so we must escape them.)

As another example, the pattern ‘[_%a][_%w]*’ matches identifiers in a Lua program:
a sequence starting with a letter or an underscore, followed by zero or
more underscores or alphanumeric characters.

Like ‘*’, the modifier ‘-’ also matches zero or more occurrences of characters
of the original class. However, instead of matching the longest sequence, it
matches the shortest one. Sometimes, there is no difference between ‘*’ and ‘-’,
but usually they present rather different results.

dd="abc123568asdbc";
patter="[%a]-";
patter2="[%a]*";
print(string.match(dd,patter)); --blank zero element will always match the empty sequence
print(string.match(dd,patter2)); --abc ,the longest match..

suppose you want to find comments in a C program. Many people would first try
‘/%*.*%*/’ (that is, a “/*” followed by a sequence of any characters followed by
“*/”, written with the appropriate escapes). However, because the ‘.*’ expands
as far as it can, the first “/*” in the program would close only with the last “*/”:

test = "int x; /* x */ int y; /* y */"
print(string.match(test, "/%*.*%*/"))
--> /* x */ int y; /* y */ 因为 .* 的原因int y 不是comment 的也被匹配了

The pattern ‘.-’, instead, will expand the least amount necessary to find the first
“*/”, so that you get the desired result:

ee = "int x; /* x */ int y; /* y */"
for w in string.gmatch(ee,"/%*.-%*/") do
print(w);
end

/* x */
/* y */

The last modifier, ‘?’, matches an optional character. As an example, suppose
we want to find an integer in a text, where the number can contain an optional
sign. The pattern ‘[+-]?%d+’ does the job, matching numerals like “-12”, “23”,
and “+1009”. The character class ‘[+-]’ matches either a ‘+’ or a ‘-’ sign; the
following ‘?’ makes this sign optional.

Unlike some other systems, in Lua a modifier can be appliedonly to a
character class; there is no way to group patterns under a modifier. For instance,
there is no pattern that matches an optional word (unless the word has only one

letter). Usually you can circumvent this limitation using some of the advanced
techniques that we will see in the end of this chapter.

print(string.match("123abcabc","abc?")); --? 其实是没效果的，结果还是abc
print(string.match("123aaabb","a")); -- a
print(string.match("123aaabb","a?")); -- empty string. single letter take effect

If a pattern begins with a ‘^’, it will match only at the beginning of the subject
string. Similarly, if it ends with a ‘$’, it will match only at the end of the subject
string. You can use these marks both to restrict the patterns that you find and to
anchor patterns. For instance, the next test checks whether the string s starts
with a digit:

if string.find(s, "^%d") then ...

print(string.match("123abc123","^%d")); --1
print(string.match("123abc123","^%d+")); --123

This one checks whether that string represents an integer number, without any
other leading or trailing characters:

if string.find(s, "^[+-]?%d+$") then ...

print(string.match("-12.356","^[+-]%d+%.?%d+$")); ---12.356

The characters ‘^’ and ‘$’ are magic only when used in the beginning or end of
the pattern. Otherwise, they act as regular characters matching themselves.

Another item in a pattern is ‘%b’, which matches balanced strings. We write
this item as ‘%bxy’, where x and y are any two distinct characters; the x acts as an
opening character and the y as the closing one. For instance, the pattern ‘%b()’
matches parts of the string that start with a ‘(’ and finish at the respective ‘)’:

s = "a (enclosed (in) parentheses) line"
print(string.gsub(s, "%b()", "")) --> a line

print(string.match("abc(123abc)abc123","%b()")); --(123abc)

Typically, we use this pattern as ‘%b()’, ‘%b[]’, ‘%b{}’, or ‘%b<>’, but you can use
any two distinct characters as delimiters.

Finally, the item ‘%f[char-set]’<是一个完整体不能单独使用%f> represents a frontier(注意翻译成[边界]) pattern. It matches an empty string only if the next character is in char-set but the previous one is not:
s = "the anthem is the theme"
print(s:gsub("%f[%w]the%f[%W]", "one"))
--> one anthem is one theme

The pattern ‘%f[%w]’ matches a frontier between a non-alphanumeric and an
alphanumeric character, and the pattern ‘%f[%W]’ matches a frontier between an
alphanumeric and a non-alphanumeric character. Therefore, the given pattern
matches the string “the” only as an entire word. Note that we must write the
char-set inside brackets, even when it is a single class.

看我的例子:

local kk="this is the book what I want, then I buy it, there I like the theroy in side it";
print(kk:gsub("the","###")); --只要有the 三个字母的都替换
print(kk:gsub("%f[%w]the%f[%W]","###")); -- print(kk:gsub("%fthe","###")); -- missing '[' after '%f' in pattern %f[char-set] a full component,can use singl %f
print(kk:gsub("%f[%w]the","###"));
print(kk:gsub("the%f[%W]","###"));

this is ### book what I want, ###n I buy it, ###re I like ### 1###roy in side it   5
this is ### book what I want, then I buy it, there I like ### 1theroy in side it   2
this is ### book what I want, ###n I buy it, ###re I like ### 1theroy in side it   4
this is ### book what I want, then I buy it, there I like ### 1theroy in side it   2

%f[%w]the%f[%W] , %f[char-set], %f 跟[char-set] 应该是一种相反的关系，比如%f[%w], 后面一个字符是alphanumeric

的话，前面的一个就不能是alphanumeric， that is non-alphanumeric. 比如 this is the book. the 前面的是空格，是non-alpha ,而t是alphanumeric, so can match. 而后面 %f[%W], 后面一个是non-alphanumeric, 那么前面一个一定是alphanumeric。所以通过这样的方式我们就只match the 这个单词而不是只要有the 三个字母就match.

The frontier pattern was already implemented in Lua 5.1, but was undocumented.
It became official only in Lua 5.2.

s = "the anthem is the theme"

The positions before the first and after the last characters in the subject
string are treated as if they had the null character (ASCII code zero). In the
previous example, the first “the” starts with a frontier between a null character
(not in the set ‘[%w]’) and a ‘t’ (in the set ‘[%w]’). ,,意思是s 开头的t 前面算有一个null ,结束的时候也当有一个null.

this patter need to compare this:The characters ‘^’ and ‘$’ are magic only when used in the beginning or end of
the pattern. %f[char-set] 是边界，而^, $ 只适合开头跟结尾，，，，

local mystr=" 123abc123 kabcthe";
print(mystr:gsub("%f[%D]abc%f[%d]","##"));

123##123 kabcthe 1

我觉得过程应该是这样的首先要找到abc, 然后我们看a, a 是在[%D] ，是的，然后看a的前一个字符，如果是开头那就是null, 这里是3 三不在%[D] ,也是对的，所以%f[%D]abc, match. 然后再看c, c不能在[%d] ，而c 后面一个字符1 ，需要在[%d] ,也是对的，所以整个匹配的过程只找到第一个abc 边界是符合条件的，后面的那个abc 就不符合了。。。。

21.4 Captures

The capture mechanism allows a pattern to yank parts of the subject string that
match parts of the pattern for further use. You specify a capture by writing the
parts of the pattern that you want to capture between parentheses.

也就是Capture 会返回capture 的那部分pattern.

When a pattern has captures, the function string.match returns each captured
value as a separate result; in other words, it breaks a string into its captured
parts.

pair = "name = Anna"
key, value = string.match(pair, "(%a+)%s*=%s*(%a+)")
print(key, value) --> name Anna

The pattern ‘%a+’ specifies a non-empty sequence of letters; the pattern ‘%s*’
specifies a possibly empty sequence of spaces. So, in the example above, the
whole pattern specifies a sequence of letters, followed by a sequence of spaces,
followed by ‘=’, again followed by spaces, plus another sequence of letters. Both
sequences of letters have their patterns enclosed in parentheses, so that they
will be captured if a match occurs. Below is a similar example:

date = "Today is 17/7/1990"
d, m, y = string.match(date, "(%d+)/(%d+)/(%d+)")
print(d, m, y) --> 17 7 1990

As a typical use, suppose you want to find, inside a string, a
substring enclosed between single or double quotes. 也就是类似的:

s=" thi is my substring 'abc sub string' what ,,,,'",

You could try a pattern such
as ‘["'].-["']’, that is, a quote followed by anything followed by another quote;
but you would have problems with strings like "it's all right". To solve this
problem, you can capture the first quote and use it to specify the second one:

local mym= [[then he said: "it's all right"!]];
print(mym:gsub("['\"].-[\"']","***"));

then he said: ***s all right"! 1 ---可以看到 it 被换了，但其实我们想找的是it's all right，中间的 ‘ 导致了这个问题。

s = [[then he said: "it's all right"!]]
q, quotedPart = string.match(s, "([\"'])(.-)%1") %1 ，我的猜想是第一个capture 的字符。也就是前面是‘，这里也必须是’，前面是double quot, then here also double quote.
print(quotedPart) --> it's all right
print(q) --> "

验证：

local mym= [[then he said: "it's all right"!]];
print(mym:gsub("['\"].-[\"']","***"));
print(mym:match("(said):%s*([\"'])(.-)%2"));

said " it's all right

确实%2 就是第二个capture ..

还是有说明的： In a pattern, an item like ‘%d’, where d is a single digit, matches only a copy
of the d-th capture.

A similar example is the pattern that matches long strings in Lua:
%[(=*)%[(.-)%]%1%]

(.-)%] 意思是找到最近的那个], if .+ ，那么就是到最后的那个].

It will match an opening square bracket followed by zero or more equal signs,
followed by another opening square bracket, followed by anything (the string
content), followed by a closing square bracket, followed by the same number of
equal signs<%1 给的ability>, followed by another closing square bracket:

p = "%[(=*)%[(.-)%]%1%]"
s = "a = [=[[[ something ]] ]==] ]=]; print(a)"
print(string.match(s, p)) --> = [[ something ]] ]==]

The first capture is the sequence of equal signs (only one sign in this example);
the second is the string content.

The third use of captured values is in the replacement string of gsub，也就是第三个参数。. Like
the pattern, also the replacement string can contain items like “%d”, which are
changed to the respective captures when the substitution is made. In particular,
the item “%0” is changed to the whole match. (By the way, a ‘%’ in the replacement
string must be escaped as “%%”.) As an example, the following command
duplicates every letter in a string, with a hyphen between the copies:

print(string.gsub("hello Lua!", "%a", "%0-%0"))， ie: I found h, h-h.
--> h-he-el-ll-lo-o L-Lu-ua-a!

This one interchanges adjacent characters:

print(string.gsub("hello Lua", "(.)(.)", "%2%1")) --> ehll ouLa

As a more useful example, let us write a primitive format converter, which
gets a string with commands written in a LaTeX style and changes them to a
format in XML style:
\command{some text} --> <command>some text</command>

If we disallow nested commands, the following call to string.gsub does the job:
s = [[the \quote{task} is to \em{change} that.]]
s = string.gsub(s, "\\(%a+){(.-)}", "<%1>%2</%1>")
print(s)
--> the <quote>task</quote> is to <em>change</em> that.

看来%d 引用 d-th capture 确实很大用处，，，

Another useful example is how to trim a string:
function trim (s)
return (string.gsub(s, "^%s*(.-)%s*$", "%1")) -- good function...need to collect in my lib.
end

Note the judicious use of pattern formats. The two anchors (‘^’ and ‘$’) ensure
that we get the whole string. Because the ‘.-’ tries to expand as little as possible,
the two patterns ‘%s*’ match all spaces at both extremities. Note also that,
because gsub returns two values, we parenthesize the call to discard the extra
result (the count).

local function trim(s)
local patter="^%s*(.-)%s*$";
-- return s:gsub(patter,"%1"); --如果最外面不加() ,那会返回2 result.
return (s:gsub(patter,"%1"));
end
teststr=" this is waht ";
print(trim(teststr)); --this is waht

21.5 Replacements

Instead of a string, we can use either a function or a table as the third argument
to string.gsub. When invoked with a function, string.gsub calls the function
every time it finds a match; the arguments to each call are the captures, and the
value that the function returns is used as the replacement string. When invoked
with a table, string.gsub looks up the table using the first capture as the key,

and the associated value is used as the replacement string. If the result from
the call or from the table lookup is nil, gsub does not change this match.

local ll="abc 123 cde fgh def hkj";
local count=0;
local function capturedCall(captured)
count=count+1;
return captured..tostring(count);
end
llr=ll:gsub("%w+",capturedCall);
print(llr); --abc1 1232 cde3 fgh4 def5 hkj6

local function expand(s,tb)
return (s:gsub("$(%w+)",tb));
end
tbt={
name="Lua";
status="great"
};
mm="$name is very $status";
print(expand(mm,tbt)); --Lua is very great

The last example goes back to our format converter, from the previous section.
Again, we want to convert commands in LaTeX style (\example{text})
to XML style (<example>text</example>), but allowing nested commands this
time. The following function uses recursion to do the job:

local function toXML(s)
s=s:gsub("\\(%a+)(%b{})",function (tag,body)
     body=body:sub(2,-2);--remove the remove brackets
   body=toXML(body);---- handle nested commands
   return string.format("<%s>%s</%s>",tag,body,tag);
end);
return s;
end

nn="\\title{The \\bold{big} example}";
print(toXML(nn));

URL encoding

For our next example, we use URL encoding, which is the encoding used by
HTTP to send parameters in a URL. This encoding encodes special characters
(such as ‘=’, ‘&’, and ‘+’) as “%xx”, where xx is the character code in hexadecimal (ANSII).
After that, it changes spaces to ‘+’ (space 不替换成ANSII).

For instance, it encodes the string “a+b = c” as “a%2B+b%3D+c”.

+ ANSI: %2B

= %3D

Finally, it writes each parameter name and parameter value
with an equal in between and appends all resulting pairs name=value with an
ampersand in between. For instance, the values

name = "al"; query = "a+b = c"; q="yes or no"
are encoded as “name=al&query=a%2Bb+%3D+c&q=yes+or+no”.

Now, suppose we want to decode this URL and store each value in a table,
indexed by its corresponding name. The following function does the basic decoding:

function unescape (s)
s = string.gsub(s, "+", " ")
s = string.gsub(s, "%%(%x%x)", function (h)
return string.char(tonumber(h, 16))
end)
return s
end

The first statement changes each ‘+’ in the string to a space. The second gsub
matches all two-digit hexadecimal numerals preceded by ‘%’ and calls an anonymous
function for each match. This function converts the hexadecimal numeral
into a number (tonumber, with base 16) and returns the corresponding character
(string.char). For instance,
print(unescape("a%2Bb+%3D+c")) --> a+b = c

To decode the pairs name=value we use gmatch. Because both names and
values cannot contain either ‘&’ or ‘=’, we can match them with the pattern
‘[^&=]+’:
cgi = {}
function decode (s)
for name, value in string.gmatch(s, "([^&=]+)=([^&=]+)") do
    name = unescape(name)
   value = unescape(value)
   cgi[name] = value
end
end

The corresponding encoding is also easy to write. First, we write the escape
function; this function encodes all special characters as a ‘%’ followed by the
character code in hexadecimal (the format option “%02X” makes a hexadecimal
number with two digits, using 0 for padding), and then changes spaces to ‘+’:

function escape (s)
s = string.gsub(s, "[&=+%%%c]", function (c)
return string.format("%%%02X", string.byte(c))
end)
s = string.gsub(s, " ", "+")
return s
end

[&=+%%%c] -- &, =,+,%,%c( control characters) 是需要escape 的字符。。

The encode function traverses the table to be encoded, building the resulting
string:

function encode (t)
   local b = {}
   for k,v in pairs(t) do
      b[#b + 1] = (escape(k) .. "=" .. escape(v))
      end
    return table.concat(b, "&")
end
t = {name = "al", query = "a+b = c", q = "yes or no"}
print(encode(t)) --> q=yes+or+no&query=a%2Bb+%3D+c&name=al

这么多年来都是靠API , 现在终于知道这url encode and decode 是怎么回事了，，，，

Tab expansion

An empty capture like ‘()’ has a special meaning in Lua. Instead of capturing
nothing (a quite useless task), this pattern captures its position in the subject
string, as a number:

print(string.match("hello", "()ll()")) --> 3 5

(Note that the result of this example is not the same as what you get from
string.find, because the position of the second empty capture is after the
match.)
A nice example of the use of position captures is for expanding tabs in a
string:

wanglang3081

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
chapter 21 The String Library

The power of a raw Lua interpreter to manipulate strings is quite limited. Aprogram can create string literals, concatenate them, and get string lengths.But it cannot extract substrings or examine
复制链接

扫一扫

专栏目录