Metacharacters are the building blocks of regular expressions. Characters in RegEx are understood to be either a metacharacter with a special meaning or a regular character with a literal meaning.
The following are some common RegEx metacharacters and examples of what they would match or not match in RegEx.
Metacharacter | Description | Examples |
---|---|---|
\d | Whole Number 0 - 9 | \d\d\d = 327 \d\d = 81 \d = 4 ----------------------------------------- \d\d\d ≠ 24631 \d\d\d doesn't return 24631 because 24631 contains 5 digits. \d\d\d only matches for a 3-digit string.
|
\w | Alphanumeric Character | \w\w\w = dog \w\w\w\w = mule \w\w = to ----------------------------------------- \w\w\w = 467 \w\w\w\w = 4673 ----------------------------------------- \w\w\w ≠ boat \w\w\w doesn't return boat because boat contains 4 characters. ----------------------------------------- \w ≠ ! \w doesn't return the exclamation point ! because it is a non-alphanumeric character.
|
\W | Symbols | \W = % \W = # \W\W\W = @#% ----------------------------------------- \W\W\W\W ≠ dog8 \W\W\W\W doesn't return dog8 because d, o, g, and 8 are alphanumeric characters.
|
[a-z] [0-9] | Character set, at least one of which must be a match, but no more than one unless otherwise specified. The order of the characters does not matter. | pand[ora] = panda pand[ora] = pando ----------------------------------------- pand[ora] ≠ pandora pand[ora] doesn't bring back pandora because it is implied in pand[ora] that only 1 character in [ora] can return.
(Quantifiers that allow pand[ora] to match for pandora is discussed below.)
|
(abc) (123) | Character group, matches the characters abc or 123 in that exact order. | pand(ora) = pandora pand(123) = pand123 ----------------------------------------- pand(oar) ≠ pandora pand(oar) does not match for pandora because it's looking for the exact phrase pandoar.
|
| | Alternation - allows for alternate matches. | operates like the Boolean OR. | pand(abc|123) = pandora OR pand123 |
? | Question mark matches when the character preceding ? occurs 0 or 1 time only, making the character match optional. | colou?r = colour (u is found 1 time) colou?r = color (u is found 0 times)
|
* | Asterisk matches when the character preceding * matches 0 or more times.
Note: * in RegEx is different from * in dtSearch. RegEx * is asking to find where the character (or grouping) preceding * is found ZERO or more times. dtSearch * is asking to find where the string of characters preceding * or following * is found 1 or more times. | tre*= tree (e is found 2 times) tre* = tre (e is found 1 time) tre* = tr (e is found 0 times) ----------------------------------------- tre* ≠ trees tre* doesn't match the term trees because although "e" is found 2 times, it is followed by "s", which is not accounted for in the RegEx. |
+ | Plus sign matches when the character preceding + matches 1 or more times. The + sign makes the character match mandatory. | tre+ = tree (e is found 2 times) tre+ = tre (e is found 1 time) ----------------------------------------- tre+ ≠ tr (e is found 0 times) tre+ doesn't match for tr because e is found zero times in tr.
|
. (period) | The period matches any alphanumeric character or symbol. | ton. = tone ton. = ton# ton. = ton4 ----------------------------------------- ton. ≠ tones ton. doesn't match for the term tones because . by itself will only match for a single character, here, in the 4th position of the term. In tones, s is the 5th character and is not accounted for in the RegEx. |
.*
| Combine the metacharacters . and *, in that order .* to match for any character 0 or more times.
NOTE: .* in RegEx is equivalent to dtSearch wildcard * operator.
| tr.* = tr tr.* = tre tr.* = tree tr.* = trees tr.* = trough tr.* = treadmill
|
RegEx quantifiers
RegEx use quantifiers to indicate the scope of a search string. You can use multiple quantifiers in your search string. The following table gives examples of the quantifiers you can use in your RegEx:
Quantifier | Description | Examples |
---|---|---|
{n} | Matches when the preceding character, or character group, occurs n times exactly. | \d{3} = 836 \d{3} = 139 \d{3} = 532 ----------------------------------------- pand[ora]{2} = pandar pand[ora]{2} = pandoo pand(ora){2} = pandoraora ----------------------------------------- pand[ora]{2} ≠ pandora pand[ora]{2} doesn't match for pandora because the quantifier {2} only allows for 2 letters from the character set [ora].
|
{n,m} | Matches when the preceding character, or character group, occurs at least n times, and at most m times. | \d{2,5} = 97430 \d{2,5} = 9743 \d{2,5} = 97 ----------------------------------------- \d{2,5} ≠ 9 9 does not match because it is 1 digit, thus outside of the character range.
|
Escaping RegEx Metacharacters
When using RegEx to search for a character that is a reserved metacharacter, use the backslash \ to escape the character so it can be recognized. The following table gives an example on how to escape a reserved metacharacter when searching.
Search For | RegEx | Match Results |
---|---|---|
UK phone number
| \+[0-9]{11} | +14528280001 +38119930978 ----------------------------------------- If the + sign is not escaped with a backslash, RegEx treats + as a quantifier instead of the literal plus sign character. |