Regular Expressions

Regular expressions (“regexps”) match strings.
/abc/ =~ "abc"
When a match is successful, the return value
0
֒→ is the position of the first matching character.
An if construct will count a successful match as
puts 'match' if /abc/ =~ "abc"
true.
match
֒→
The matching substring can be anywhere in the
/abc/ =~ "cbaabc"
string.
3
֒→
When the string doesn’t match, the result is nil.
/abc/ =~ "ab!c"
nil
֒→
There may be more than one match in the string.
/abc/ =~ "abc and abc"
Matching always returns the index of the first
0
֒→ match.
Case matters.
/cow/ =~ "Cow"
nil
֒→
The regular expression doesn’t have to be on the
"foofarah" =~ /foo/
left.
0
֒→

10.1 Special Characters
You can anchor the match to the beginning of
/^abc/ =~ "!abc"
the string with ˆ (the caret character, sometimes
nil
֒→ called “hat”).
You can also anchor the match to the end
/abc$/ =~ "abc!"
of the string with a dollar sign character,
nil
֒→ often abbreviated “dollar.” Special characters
like the caret and dollar are what make regular
expressions more powerful than something like
"string".include?("ing").

\d Any digit
\D Any character except a digit
\s “whitespace”: space, tab, carriage return, line feed, or newline
\S Anything except whitespace
\w A “word character”: [A-Za-z0-9_]
\W Any character except a word character
Figure 10.1: Character Classes

A period (“dot”) matches any character.
/a.c/ =~ "does abc match?"
5
֒→
The asterisk character (“star”) matches any
/ab*c/ =~ "does abbbbc match?"
number of occurrences of the character preced-
5
֒→ ing it.
“Any number” includes zero.
/ab*c/ =~ "does ac match?"
5
֒→
Frequently, you’ll want to match one or more
/ab+c/ =~ "does ac match?"
occurrence but not zero. That’s done with the
nil
֒→ plus character.
The question mark character matches zero or
/ab?c/ =~ "does ac match?"
one occurrences but not more than one.
5
֒→
Special characters can be combined. The com-
/a.*b/ =~ "a ! b ! i j k b"
bination of a dot and star is used to match any
0
֒→ number of any kind of character.
To match all characters in a character class,
/[0123456789]+/ =~ "number 55"
enclose them within square brackets.

7
֒→
Character classes containing alphabetically
/[0-9][a-f]/ =~ "5f"
ordered runs of characters can be abbreviated
0
֒→ with the dash.
Within brackets, characters like the dot, plus,
/[.]/ =~ "b"
and star are not special.
nil
֒→
Outside of brackets, special characters can be
/\[a\]\+/ =~ "[a]+"
stripped of their powers by “escaping” them with
0
֒→ a backslash.
To include open and close brackets inside of
/^[\[=\]]+$/ =~ '=]=[='
brackets, escape them with a backslash. This
0
֒→ expression matches any sequence of one or more
characters, all of which must be either [, ], or =.
(The two anchors ensure that there are no char-
acters before or after the matching characters.)

Putting a caret at the beginning of a character
/[^ab]/ =~ "z"
class causes the set to contain all characters
0
֒→ except the ones listed.
Some character classes are so common they’re
/=\d=[x\d]=/ =~ "=5=x="
given abbreviations. \d is the same character
0
֒→ class as [0-9]. Other characters can be added
to the abbreviation, in which case brackets are
needed. See Figure 10.1, on the previous page,
for a complete list of abbreviations.
10.2 Grouping and Alternatives

Parentheses can group sequences of characters
/(ab)+/ =~ "ababab"
so that special characters apply to the whole
0
֒→ sequence.
Special characters can appear within groups.
/(ab*)+/ =~ "aababbabbb"
Here, the group containing one a and any num-
0
֒→ ber of b’s is repeated one or more times.
The vertical bar character is used to allow alter-
/a|b/ =~ "a"
natives. Here, either a or b match.
0
֒→
[color=red] A vertical bar divides the regular expression into
/^Fine birds|cows ate\.$/ =~
two smaller regular expressions. A match means
"Fine birds ate seeds."
that either the entire left regexp matches or the
0
֒→ entire right one does.
This regular expression does not mean “Match
either 'Fine birds ate.' or 'Fine cows ate.'” It actu-
ally matches either a string beginning with "Fine
birds" or one ending in "cows ate."[/color]

This regular expression matches only the two
/^Fine (birds|cows) ate\.$/ =~
alternate sentences, not the infinite number of
"Fine birds ate seeds."
possibilities the previous example’s regexp does.
nil
֒→
10.3 Taking Strings Apart
Like the =~ operator, match returns nil if there’s
re = /(\w+), (\w+), or (\w+)/
no match. If there is, it returns a MatchData
s = 'Without a Bob, ox, or bin!'
object. You can pull information out of that
match = re.match(s)
object.
֒→ #<MatchData:0x323c44>
A MatchData is indexable. Its zeroth element is
match[0]
the entire match.
֒→ "Bob, ox, or bin"
Each following element stores the result of what
match[1]
a group matched, counting from left to right.
֒→ "Bob"

Groups are often used to pull apart strings and
"#{match[3]} and #{match[1]}"
construct new ones.
֒→ "bin and Bob"
pre_match returns any portion of the string
match.pre_match
before the part that matched.
֒→ "Without a "
post_match returns any portion of the string
match.post_match
after the part that matched. match.pre_match,
֒→ "!" match[0], and match.post_match can be added
together to reconstruct the original string.
The plus and star special characters are greedy:
str = "a bee in my bonnet"
they match as many characters as they can.
/a.*b/.match(str)[0]
Expect that to catch you by surprise sometimes.
֒→ "a bee in my b"
You can make plus and star match as few char-
/a.*?b/.match(str)[0]
acters as they can by suffixing them with a ques-
֒→ "a b" tion mark.

You can use a regular expression to slice a
"has 5 and 3" [/\d+/]
string. The result is the first substring that
֒→ "5" matches the regular expression.
10.4 Variables Behind the Scenes
Both =~ and match set some variables. All begin
re = /(\w+), (\w+), or (\w+)/
with $. Each parenthesized group gets its own
s = 'Without a Bob, ox, or bin!'
number, from $1 up through $9. You might
re =~ s
expect $0 to name the entire string that matched,
[$1, $2, $3]
but it’s already used for something else: the
֒→ ["Bob" , "ox" , "bin" ] name of the program being executed.
$& is the equivalent of match[0].
[color=red]$&
֒→ "Bob, ox, or bin"
These two variables are used to store the string
$‘ + $'
before the match and the string after the match.
֒→ "Without a !" (The first is a backward quote / backtick; the
second a normal quote.)[/color]

These variables are probably most often used to immediately do some-
thing with a string that’s “equal enough” to some pattern. Like this:
if name =~ /(.+), (.+)/
name = "#{$2} #{$1}"
end
10.5 Regular Expression Options
Normally, the period in a regular expression
/a.*b/ =~ "az\nzb"
does not match the end-of-line character. There-
nil
֒→ fore, .* or .+ matches won’t span lines.

[color=darkred] Adding the m (multiline) option makes a period
/a.*b/m =~ "az\nzb"
match end-of-line characters, so the regular
0
֒→ expression match can span lines.
This is a far too annoying way to do a case-
/[cC][aA][tT]/ =~ "Cat"
insensitive match.
0
֒→
The i (insensitive) option is a better way.
/cat/i =~ "Cat"
0
֒→[/color]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值