Regular expression:The dot

The Dot Matches (Almost) Any Character

In regular expressions, the dot or period is one of the most commonly used metacharacters. Unfortunately, it is also the most commonly misused metacharacter.

The dot matches a single character, without caring what that character is. The only exception are newline characters. In all regex flavors discussed in this tutorial, the dot will not match a newline character by default. So by default, the dot is short for the negated character class [^/n] (UNIX regex flavors) or [^/r/n] (Windows regex flavors).

This exception exists mostly because of historic reasons. The first tools that used regular expressions were line-based. They would read a file line by line, and apply the regular expression separately to each line. The effect is that with these tools, the string could never contain newlines, so the dot could never match them.

Modern tools and languages can apply regular expressions to very large strings or even entire files. All regex flavors discussed here have an option to make the dot match all characters, including newlines. In RegexBuddy, EditPad Pro or PowerGREP, you simply tick the checkbox labeled "dot matches newline".

In Perl, the mode where the dot also matches newlines is called "single-line mode". This is a bit unfortunate, because it is easy to mix up this term with "multi-line mode". Multi-line mode only affects anchors, and single-line mode only affects the dot. You can activate single-line mode by adding an s after the regex code, like this: m/^regex$/s;.

Other languages and regex libraries have adopted Perl's terminology. When using the regex classes of the .NET framework, you activate this mode by specifying RegexOptions.Singleline, such as in Regex.Match("string", "regex", RegexOptions.Singleline).

In all programming languages and regex libraries I know, activating single-line mode has no effect other than making the dot match newlines. So if you expose this option to your users, please give it a clearer label like was done in RegexBuddy, EditPad Pro and PowerGREP.

JavaScript and VBScript do not have an option to make the dot match line break characters. In those languages, you can use a character class such as [/s/S] to match any character. This character matches a character that is either a whitespace character (including line break characters), or a character that is not a whitespace character. Since all characters are either whitespace or non-whitespace, this character class matches any character.

Use The Dot Sparingly

The dot is a very powerful regex metacharacter. It allows you to be lazy. Put in a dot, and everything will match just fine when you test the regex on valid data. The problem is that the regex will also match in cases where it should not match. If you are new to regular expressions, some of these cases may not be so obvious at first.

I will illustrate this with a simple example. Let's say we want to match a date in mm/dd/yy format, but we want to leave the user the choice of date separators. The quick solution is /d/d./d/d./d/d. Seems fine at first. It will match a date like 02/12/03 just fine. Trouble is: 02512703 is also considered a valid date by this regular expression. In this match, the first dot matched 5, and the second matched 7. Obviously not what we intended.

/d/d[- /.]/d/d[- /.]/d/d is a better solution. This regex allows a dash, space, dot and forward slash as date separators. Remember that the dot is not a metacharacter inside a character class, so we do not need to escape it with a backslash.

This regex is still far from perfect. It matches 99/99/99 as a valid date. [0-1]/d[- /.][0-3]/d[- /.]/d/d is a step ahead, though it will still match 19/39/99. How perfect you want your regex to be depends on what you want to do with it. If you are validating user input, it has to be perfect. If you are parsing data files from a known source that generates its files in the same way every time, our last attempt is probably more than sufficient to parse the data without errors. You can find a better regex to match dates in the example section.

Use Negated Character Sets Instead of the Dot

I will explain this in depth when I present you the repeat operators star and plus, but the warning is important enough to mention it here as well. I will illustrate with an example.

Suppose you want to match a double-quoted string. Sounds easy. We can have any number of any character between the double quotes, so ".*" seems to do the trick just fine. The dot matches any character, and the star allows the dot to be repeated any number of times, including zero. If you test this regex on Put a "string" between double quotes, it will match "string" just fine. Now go ahead and test it on Houston, we have a problem with "string one" and "string two". Please respond.

Ouch. The regex matches "string one" and "string two". Definitely not what we intended. The reason for this is that the star is greedy.

In the date-matching example, we improved our regex by replacing the dot with a character class. Here, we will do the same. Our original definition of a double-quoted string was faulty. We do not want any number of any character between the quotes. We want any number of characters that are not double quotes or newlines between the quotes. So the proper regex is "[^"/r/n]*".

<script type="text/javascript"> </script><script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript"> </script>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
### 回答1: 这是一个JavaScript的错误提示,意思是正则表达式格式不正确,缺少了斜杠(/)。正则表达式是一种用于匹配字符串的模式,必须以斜杠开始和结束,例如:/hello/。如果缺少斜杠,就会出现这个错误。需要检查代码中的正则表达式是否正确书写。 ### 回答2: 这个错误信息是由于在JavaScript代码中正则表达式的语法出了问题,具体来说,是在写正则表达式时忘记了加上正则表达式的起始和结束符号"/",导致解析器无法识别这个正则表达式。 正则表达式是在JavaScript中常用的一种文本处理方式,常用于匹配和替换字符串中的某些特定模式。它的基本语法是在两个正斜杠"//"之间包含一些字符和特殊符号,表示需要匹配的模式。 这个错误信息发生的原因可能是在写正则表达式时,没有注意到正则表达式需要用正斜杠包含起来,例如: ``` var str = "Hello World"; var pattern = "llo"; // 错误的写法,缺少正则表达式的起始和结束符号 var match = str.match(pattern); ``` 这段代码中,我们想要在字符串"Hello World"中匹配"llo"这个模式。但是,pattern变量没有包含在正则表达式的起始和结束符号之间,导致解析器无法识别这个字符串为一个合法的正则表达式,从而报错。 正确的写法应该是在pattern变量前后加上"/",如下: ``` var str = "Hello World"; var pattern = /llo/; // 正确的写法,正则表达式被包含在两个斜杠之间 var match = str.match(pattern); ``` 这样,pattern变量就成为了一个合法的正则表达式,可以成功匹配字符串中的"llo"模式。 总之,uncaught syntaxerror: invalid regular expression: missing /错误信息表示在JavaScript代码中写正则表达式时缺少正则表达式的起始和结束符号"/",需要注意在编写正则表达式时要正确包含在两个正斜杠之间。 ### 回答3: “uncaught syntaxerror: invalid regular expression: missing /”是JavaScript中常见的一个错误信息,一般发生在正则表达式中出现问题的时候。 这个错误的意思是“无法捕获(uncaught)语法错误(syntaxerror):无效的正则表达式(invalid regular expression):缺少/符号(missing /)”。 正则表达式是一种描述文本模式的工具,它用于对字符串进行匹配、搜索和替换操作。在使用正则表达式时,需要注意以下几点: 1. 正则表达式必须用斜杠“/”括起来,例如:/hello/。 2. 正则表达式中的特殊字符必须用反斜杠“\”进行转义,例如:/\s/匹配空白字符。 3. 正则表达式中的每个特殊字符都有其特殊的含义,例如:点号“.”匹配任何字符。 4. 如果正则表达式中包含斜杠“/”,需要对斜杠进行转义,例如:/http:\/\/www\.baidu\.com/。 如果出现了“uncaught syntaxerror: invalid regular expression: missing /”这个错误,说明正则表达式中可能缺少了斜杠“/”,或者有特殊字符未进行转义。解决这个问题的方法是检查正则表达式的语法是否正确,确保每个特殊字符都进行了转义,并且正则表达式已经用斜杠“/”括起来了。 实例:如果正则表达式中缺少/符号,例如:\d+,就会出现“uncaught syntaxerror: invalid regular expression: missing /”这个错误。需要将正则表达式改为:/\d+/。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值