Reguler Expression
\d [0-9] Digit character
\D [^0-9] Any character except a digit
\s [\s\t\r\n\f] Whitespace character
\S [^\s\t\r\n\f] Any character except whitespace
\w [A-Za-z0-9_] Word character
\W [^A-Za-z0-9_] Any character except a word character
[:alnum:] Alphanumeric
[:alpha:] Uppercase or lowercase letter
[:blank:] Blank and tab
[:cntrl:] Control characters (at least 0x00–0x1f, 0x7f)
[:digit:] Digit
[:graph:] Printable character excluding space
[:lower:] Lowercase letter
[:print:] Any printable character (including space)
[:punct:] Printable character excluding space and alphanumeric
[:space:] Whitespace (same as \s)
[:upper:] Uppercase letter
[:xdigit:] Hex digit (0–9, a–f, A–F)
r* matches zero or more occurrences of r.
r+ matches one or more occurrences of r.
r? matches zero or one occurrence of r.
r{m,n} matches at least “m” and at most “n” occurrences
r{m,} matches at least “m” occurrences of r.
r{m} matches exactly “m” occurrences of r.
Backslash Sequences in the Substitution
Earlier we noted that the sequences \1, \2, and so on, are available in the pattern,
standing for the nth group matched so far. The same sequences are available in the
second argument of sub and gsub.
"fred:smith".sub(/(\w+):(\w+)/, '\2, \1') ! "smith, fred"
"nercpyitno".gsub(/(.)(.)/, '\2\1') ! "encryption"
Additional backslash sequences work in substitution strings: \& (last match), \+ (last
matched group), \` (string prior to match), \' (string after match), and \\ (a literal
backslash).
It gets confusing if you want to include a literal backslash in a substitution. The obvious
thing is to write
str.gsub(/\\/, '\\\\')
Clearly, this code is trying to replace each backslash in str with two. The programmer
doubled up the backslashes in the replacement text, knowing that they’d be converted
to \\ in syntax analysis. However, when the substitution occurs, the regular expression
engine performs another pass through the string, converting \\ to \, so the net effect
is to replace each single backslash with another single backslash. You need to write
gsub(/\\/, '\\\\\\\\')!
str = 'a\b\c' ! "a\b\c"
str.gsub(/\\/, '\\\\\\\\') ! "a\\b\\c"
\d [0-9] Digit character
\D [^0-9] Any character except a digit
\s [\s\t\r\n\f] Whitespace character
\S [^\s\t\r\n\f] Any character except whitespace
\w [A-Za-z0-9_] Word character
\W [^A-Za-z0-9_] Any character except a word character
[:alnum:] Alphanumeric
[:alpha:] Uppercase or lowercase letter
[:blank:] Blank and tab
[:cntrl:] Control characters (at least 0x00–0x1f, 0x7f)
[:digit:] Digit
[:graph:] Printable character excluding space
[:lower:] Lowercase letter
[:print:] Any printable character (including space)
[:punct:] Printable character excluding space and alphanumeric
[:space:] Whitespace (same as \s)
[:upper:] Uppercase letter
[:xdigit:] Hex digit (0–9, a–f, A–F)
r* matches zero or more occurrences of r.
r+ matches one or more occurrences of r.
r? matches zero or one occurrence of r.
r{m,n} matches at least “m” and at most “n” occurrences
r{m,} matches at least “m” occurrences of r.
r{m} matches exactly “m” occurrences of r.
Backslash Sequences in the Substitution
Earlier we noted that the sequences \1, \2, and so on, are available in the pattern,
standing for the nth group matched so far. The same sequences are available in the
second argument of sub and gsub.
"fred:smith".sub(/(\w+):(\w+)/, '\2, \1') ! "smith, fred"
"nercpyitno".gsub(/(.)(.)/, '\2\1') ! "encryption"
Additional backslash sequences work in substitution strings: \& (last match), \+ (last
matched group), \` (string prior to match), \' (string after match), and \\ (a literal
backslash).
It gets confusing if you want to include a literal backslash in a substitution. The obvious
thing is to write
str.gsub(/\\/, '\\\\')
Clearly, this code is trying to replace each backslash in str with two. The programmer
doubled up the backslashes in the replacement text, knowing that they’d be converted
to \\ in syntax analysis. However, when the substitution occurs, the regular expression
engine performs another pass through the string, converting \\ to \, so the net effect
is to replace each single backslash with another single backslash. You need to write
gsub(/\\/, '\\\\\\\\')!
str = 'a\b\c' ! "a\b\c"
str.gsub(/\\/, '\\\\\\\\') ! "a\\b\\c"