Eloquent JavaScript 笔记九: Regular Expressions（下）

最新推荐文章于 2024-05-11 10:06:28 发布

wershest

最新推荐文章于 2024-05-11 10:06:28 发布

阅读量439

点赞数

分类专栏： js xxx Eloquent JavaScript 学习笔记文章标签： javascript 前端

本文链接：https://blog.csdn.net/wershest/article/details/72820110

版权

js xxx 同时被 2 个专栏收录

41 篇文章 0 订阅

订阅专栏

Eloquent JavaScript 学习笔记

22 篇文章 1 订阅

订阅专栏

12. The replace method

replace() 是String对象的一个方法。

"papa".replace("p", "m");

第一个参数可以是RegExp对象。

console.log("Borobudur".replace(/[ou]/, "a"));
// → Barobudur
console.log("Borobudur".replace(/[ou]/g, "a"));
// → Barabadar

RegExp的末尾加上g，是global的意思，全部替换。String 没有replaceAll方法，只能用这种方式做 "全部替换"。

分组替换

看一个例子，调换名和姓的顺序：

console.log(
  "Hopper, Grace\nMcCarthy, John\nRitchie, Dennis"
    .replace(/([\w ]+), ([\w ]+)/g, "$2 $1"));
// → Grace Hopper
//   John McCarthy
//   Dennis Ritchie

$1代表匹配到的第一个括号分组，$2代表第二个，以此类推，最多到 $9。 $& 代表整个RegExp匹配的字符串。

函数替换
replace()的第二个参数可以是个function

var s = "the cia and fbi";
console.log(s.replace(/\b(fbi|cia)\b/g, function(str) {
  return str.toUpperCase();
}));
// → the CIA and FBI

function的参数是搜索结果，返回值是替换后的字符串。

再看一个更复杂的例子：

var stock = "1 lemon, 2 cabbages, and 101 eggs";
function minusOne(match, amount, unit) {
  amount = Number(amount) - 1;
  if (amount == 1) // only one left, remove the 's'
    unit = unit.slice(0, unit.length - 1);
  else if (amount == 0)
    amount = "no";
  return amount + " " + unit;
}
console.log(stock.replace(/(\d+) (\w+)/g, minusOne));
// → no lemon, 1 cabbage, and 100 eggs

注意function minusOne的参数。还记得match()方法吗，它的返回值是一个字符串数组，数组中第0个元素是整个匹配的字符串，后面依次是各个 () 分组匹配到的字符串。

minusOne的参数恰恰就是match() 返回的数组中的元素。

13. Greed

先看一个例子：

function stripComments(code) {
  return code.replace(/\/\/.*|\/\*[^]*\*\//g, "");
}
console.log(stripComments("1 + /* 2 */3"));
// → 1 + 3
console.log(stripComments("x = 10;// ten!"));
// → x = 10;
console.log(stripComments("1 /* a */+/* b */ 1"));
// → 1  1

这应该是编译器常用的方法，去除注释。

这里又个trick，[^]，它和 . 的作用类似，匹配任意非空字符。它和 . 的区别是：. 不能匹配换行，而[^] 可以。

第三个字符串的替换结果不对，这一行中有两组注释，但被当成了一组，把中间的加号也弄没了。

这叫greed，贪婪，就是尽可能长的去匹配字符串。
重复性操作符（+, *, ?, {n,m}）都是greedy，在后面分别加上? （+?, *?, ??, {n,m}?）就变成了non-greedy。它就会尽可能短的去匹配一个模式。

上面的RegExp中 [^]* 是greedy，改成 [^]*? 就变成了 non-greedy，再去替换第三个字符串就可以得出正确的结果了。

14. Dynamically creating RegExp objects

看一个例子：

var names = ["harry", "mary"];
var text = "Harry is a suspicious character, but Mary is not.";
names.forEach(function(name){
    var regexp = new RegExp("\\b(" + name + ")\\b", "gi");
    text.replace(regexp, "_$1_"));
});
console.log(text);

// → _Harry_ is a suspicious character, but _Mary_ is not.

这中情况下，我们无法用两个斜线来创建RegExp，而只能用这种常规字符串的方式。

注意\b，b前面要有两个反斜线。

假如name中有特殊字符怎么办？例如，有人叫：dea+hl[]rd

var name = "dea+hl[]rd";
var text = "This dea+hl[]rd guy is super annoying.";
var escaped = name.replace(/[^\w\s]/g, "\\$&");
var regexp = new RegExp("\\b(" + escaped + ")\\b", "gi");
console.log(text.replace(regexp, "_$1_"));
// → This _dea+hl[]rd_ guy is super annoying.

第三行，把所有非word字符、非空白字符，都加上了反斜线。这样就确保了所有的字符都不再有特殊含义。

15. The search method

console.log("  word".search(/\S/));
// → 2
console.log("    ".search(/\S/));
// → -1

16. The lastIndex property

var digit = /\d/g;
console.log(digit.exec("here it is: 1"));
// → ["1"]
console.log(digit.exec("and now: 1"));
// → null

奇怪哈，为什么第二个exec返回null ？

RegExp对象有一个属性lastIndex，初始化时，它等于0，当执行一次匹配之后，这个lastIndex就是匹配字符末尾的index，如上例，第一次exec之后，digit.lastIndex 等于13。

再次执行exec时，从这个lastIndex之后的字符开始匹配。

global match 的返回值

"Banana".match(/an/g)

返回的数组是所有匹配的字符串，即使RegExp中有()分组，也不会返回分组的匹配字符串。

17. Looping over matches

我们可以利用lastIndex属性，循环处理匹配字符串：

var input = "A string with 3 numbers in it... 42 and 88.";
var number = /\b(\d+)\b/g;
var match;
while (match = number.exec(input))
  console.log("Found", match[1], "at", match.index);
// → Found 3 at 14
//   Found 42 at 33
//   Found 88 at 40

我觉得这个lastIndex设计的很奇葩，非常不直观。曾经在其他地方看到过类似的写法，天书一样，完全猜不出来什么意思。

18. Parsing an ini file

看一个比较复杂的例子。把 ini 文件解析成Object数组。每个section是一个Object，每一个配置项是section的一个属性。

searchengine=http://www.google.com/search?q=$1 
spitefulness=9.7 

; comments are preceded by a semicolon... 
; each section concerns an individual enemy 
[larry] 
fullname=Larry Doe 
type=kindergarten bully 
website=http://www.geocities.com/CapeCanaveral/11451 

[gargamel] 
fullname=Gargamel 
type=evil sorcerer 
outputdir=/home/marijn/enemies/gargamel

function parseINI(string) {
  // Start with an object to hold the top-level fields
  var currentSection = {name: null, fields: []};
  var categories = [currentSection];

  string.split(/\r?\n/).forEach(function(line) {
    var match;
    if (/^\s*(;.*)?$/.test(line)) {
      return;
    } else if (match = line.match(/^\[(.*)\]$/)) {
      currentSection = {name: match[1], fields: []};
      categories.push(currentSection);
    } else if (match = line.match(/^(\w+)=(.*)$/)) {
      currentSection.fields.push({name: match[1],
                                  value: match[2]});
    } else {
      throw new Error("Line '" + line + "' is invalid.");
    }
  });

  return categories;
}

注意几点：

1. 兼容不同平台的换行符: /\r?\n/

2. 第一个if， ; 开头的一行是注释，在 ; 前面可能有空白，所以用 /^\s*(;.*)$/

3. 第二个if，section 名字用 [ ] 包起来，方括弧是RegExp中的特殊字符，需要加反斜线。用()分组抽取section名字。

4. 第三个if，用()分组抽取配置项的name和value。

19. International Characters

\w 只能识别26个英文字母。

\s 可以识别其他unicode的空白。

匹配中文字符：

/[\u4e00-\u9eff]/.test("陈");

20. Summary

熟练掌握下面的RegExp

/abc/

/[abc]/

/[^abc]/

/[0-9]/

/x+/

/x+?/

/x*/

/x?/

/x{2,4}/

/(abc)/

/a|b|c/

/\d/

/\w/

/\s/

/./

/\b/

/^/

/$/

RegExp的方法：

test()

exec()

String的方法：

match()

replace()

RegExp的options：

21. Exercise: Regexp golf

1. car and cat

/ca(r|t)/ 不要忘记圆括号

/ca[rt]/ 这种写法更短

2. pop and prop

/pr?op/

3. ferret, ferry, and ferrari

/ferr(et|y|ari)/

4. Any word ending in ious

/ious\b/

5.A whitespace character followed by a dot, comma, colon, or semicolon

/\s(\.|,|;)/

/\s[.,:;]/ | 这个符号最容易想起 or 的关系，但 [ ] 更适用于单个字符

6. A word longer than six letters

/\w{7,}/

7. A word without the letter e

/\b[^e\s]\b/ （不要忘记 \s, 否则，单个空格也会匹配。）其实还有很多特殊字符，不属于word的内容。

/\b[a-df-z]+\b/i

22. Exercise Quoting style

人物对话用是用单引号包起来的，把它们替换成双引号，注意，it's 这样的单引号需要保留。

这是按照书中的提示想出来的：

var text = "'I'm the cook,' he said, 'it's my job.'";
// Change this call.
console.log(text.replace(/(^')|(\W')|('\W)/g, function(m){
  return m.replace("'", "\"");}));
// → "I'm the cook," he said, "it's my job."

正确答案：

text.replace(/(^|\W)'|'(\W|$)/g, '$1"$2')

注意 $1 和 $2 ，如果没有匹配上，则它替换的位置为空。这个很重要，既不是null，也不是undefined。

23. Exercise Numbers again

JavaScript 支持的十进制数，包括科学计数法: 1.3e2

/^[-+]?(\d+\.?\d*|\d*\.?\d+)(e[-+]?\d*)?$/i

分成三部分：

1. 正负号

2. 基数

3. 指数

答案：

/^(\+|-|)(\d+(\.\d*)?|\.\d+)([eE](\+|-|)\d+)?$/