参见: http://lua-users.org/wiki/FrontierPattern
Frontier Pattern
lua-users home
wiki
The "frontier" expression pattern %f is undocumented in the standard Lua references (for reasons why see LuaList:2006-12/msg00536.html).
I would like to present here the usefulness of it, in an attempt to show how it can be used, and why it should be retained.
Let's consider a fairly straightforward task: to find all words in upper-case in a string.
First attempt: %u+
string.gsub ("the QUICK brown fox", "%u+", print)
QUICK
That looks OK, found a word in all caps. But look at this:
string.gsub ("the QUICK BROwn fox", "%u+", print)
QUICK
BRO
We also found a word which was partially capitalised.
Second attempt: %u+%A
string.gsub ("the QUICK BROwn fox", "%u+%A", print)
QUICK
The detection of non-letters correctly excluded the partially capitalised word. But wait! How about this:
string.gsub ("the QUICK brOWN fox", "%u+%A", print)
QUICK
OWN
We also have a second problem:
string.gsub ("the QUICK. brown fox", "%u+%A", print)
QUICK.
The punctuation after the word is now part of the captured string, which is not wanted.
Third attempt: %A%u+%A
string.gsub ("the QUICK brOWN FOx jumps", "%A%u+%A", print)
QUICK
This correctly excludes the two partially capitalised words, but still leaves the punctuation in, like this:
string.gsub ("the (QUICK) brOWN FOx jumps", "%A%u+%A", print)
(QUICK)
Also, there is another problem, apart from capturing the non-letters at the sides. Look at this:
string.gsub ("THE (QUICK) brOWN FOx JUMPS", "%A%u+%A", print)
(QUICK)
The correctly capitalised words at the start and end of the string are not detected.
The solution: The Frontier pattern: %f
string.gsub ("THE (QUICK) brOWN FOx JUMPS", "%f[%a]%u+%f[%A]", print)
THE
QUICK
JUMPS
The frontier pattern %f followed by a set detects the transition from "not in set" to "in set". The source string boundary qualifies as "not in set" so it also matches the word at the very start of the string to be matched.
The second frontier pattern is also matched at the end of the string, so our final word is also captured.
Alternatives without the frontier pattern
Without the frontier pattern, one might resort to things like this:
s = "THE (QUICK) brOWN FOx JUMPS"
s = "\0" .. s:gsub("(%A)(%u)", "%1\0%2")
:gsub("(%u)(%A)", "%1\0%2") .. "\0"
s = s:gsub("%z(%u+)%z", print)
This page brought to you by NickGammon. FindPage · RecentChanges · preferences
edit · history
Last edited July 7, 2007 7:17 pm GMT (diff)
看下代码lstrlib.c:
case 'f': { /* frontier? */
const char *ep; char previous;
p += 2;
if (*p != '[')
luaL_error(ms->L, "missing " LUA_QL("[") " after "
LUA_QL("%%f") " in pattern");
ep = classend(ms, p); /* points to what is next */
previous = (s == ms->src_init) ? '\0' : *(s-1);
if (matchbracketclass(uchar(previous), p, ep-1) ||
!matchbracketclass(uchar(*s), p, ep-1)) return NULL;
p=ep; goto init; /* else return match(ms, s, ep); */
}
就知道如何用了。
Frontier Pattern
lua-users home
wiki
The "frontier" expression pattern %f is undocumented in the standard Lua references (for reasons why see LuaList:2006-12/msg00536.html).
I would like to present here the usefulness of it, in an attempt to show how it can be used, and why it should be retained.
Let's consider a fairly straightforward task: to find all words in upper-case in a string.
First attempt: %u+
string.gsub ("the QUICK brown fox", "%u+", print)
QUICK
That looks OK, found a word in all caps. But look at this:
string.gsub ("the QUICK BROwn fox", "%u+", print)
QUICK
BRO
We also found a word which was partially capitalised.
Second attempt: %u+%A
string.gsub ("the QUICK BROwn fox", "%u+%A", print)
QUICK
The detection of non-letters correctly excluded the partially capitalised word. But wait! How about this:
string.gsub ("the QUICK brOWN fox", "%u+%A", print)
QUICK
OWN
We also have a second problem:
string.gsub ("the QUICK. brown fox", "%u+%A", print)
QUICK.
The punctuation after the word is now part of the captured string, which is not wanted.
Third attempt: %A%u+%A
string.gsub ("the QUICK brOWN FOx jumps", "%A%u+%A", print)
QUICK
This correctly excludes the two partially capitalised words, but still leaves the punctuation in, like this:
string.gsub ("the (QUICK) brOWN FOx jumps", "%A%u+%A", print)
(QUICK)
Also, there is another problem, apart from capturing the non-letters at the sides. Look at this:
string.gsub ("THE (QUICK) brOWN FOx JUMPS", "%A%u+%A", print)
(QUICK)
The correctly capitalised words at the start and end of the string are not detected.
The solution: The Frontier pattern: %f
string.gsub ("THE (QUICK) brOWN FOx JUMPS", "%f[%a]%u+%f[%A]", print)
THE
QUICK
JUMPS
The frontier pattern %f followed by a set detects the transition from "not in set" to "in set". The source string boundary qualifies as "not in set" so it also matches the word at the very start of the string to be matched.
The second frontier pattern is also matched at the end of the string, so our final word is also captured.
Alternatives without the frontier pattern
Without the frontier pattern, one might resort to things like this:
s = "THE (QUICK) brOWN FOx JUMPS"
s = "\0" .. s:gsub("(%A)(%u)", "%1\0%2")
:gsub("(%u)(%A)", "%1\0%2") .. "\0"
s = s:gsub("%z(%u+)%z", print)
This page brought to you by NickGammon. FindPage · RecentChanges · preferences
edit · history
Last edited July 7, 2007 7:17 pm GMT (diff)
看下代码lstrlib.c:
case 'f': { /* frontier? */
const char *ep; char previous;
p += 2;
if (*p != '[')
luaL_error(ms->L, "missing " LUA_QL("[") " after "
LUA_QL("%%f") " in pattern");
ep = classend(ms, p); /* points to what is next */
previous = (s == ms->src_init) ? '\0' : *(s-1);
if (matchbracketclass(uchar(previous), p, ep-1) ||
!matchbracketclass(uchar(*s), p, ep-1)) return NULL;
p=ep; goto init; /* else return match(ms, s, ep); */
}
就知道如何用了。