Regex Matching Modes
Most regular expression engines discussed in this tutorial support the following four matching modes:
- /i makes the regex match case insensitive.
- /s enables "single-line mode". In this mode, the dot matches newlines.
- /m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string.
- /x enables "free-spacing mode". In this mode, whitespace between regex tokens is ignored, and an unescaped # starts a comment.
Two languages that don't support all of the above three are JavaScript and Ruby. Some regex flavors also have additional modes or options that have single letter equivalents. These are very implementation-dependent.
Most tools that support regular expressions have checkboxes or similar controls that you can use to turn these modes on or off. Most programming languages allow you to pass option flags when constructing the regex object. E.g. in Perl, m/regex/i turns on case insensitivity, while Pattern.compile("regex", Pattern.CASE_INSENSITIVE) does the same in Java.
Specifying Modes Inside The Regular Expression
Sometimes, the tool or language does not provide the ability to specify matching options. E.g. the handy String.matches() method in Java does not take a parameter for matching options like Pattern.compile() does.
In that situation, you can add a mode modifier to the start of the regex. E.g. (?i) turns on case insensitivity, while (?ism) turns on all three options.
Turning Modes On and Off for Only Part of The Regular Expression
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex, the modifier only applies to the part of the regex to the right of the modifier. You can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.
Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.
You can quickly test how the regex flavor you're using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.
Modifier Spans
Instead of using two modifiers, one to turn an option on, and one to turn it off, you use a modifier span. (?i)ignorecase(?-i)casesensitive(?i)ignorecase is equivalent to (?i)ignorecase(?-i:casesensitive)ignorecase. You have probably noticed the resemblance between the modifier span and the non-capturing group (?:group). Technically, the non-capturing group is a modifier span that does not change any modifiers. It is obvious that the modifier span does not create a backreference.
Modifier spans are supported by all regex flavors that allow you to use mode modifiers in the middle of the regular expression, and by those flavors only. These include the JGsoft engine, .NET, Java, Perl and PCRE.