[翻译]High Performance JavaScript(015)

第五章  Strings and Regular Expressions  字符串和正则表达式

 

    Practically all JavaScript programs are intimately tied to strings. For example, many applications use Ajax to fetch strings from a server, convert those strings into more easily usable JavaScript objects, and then generate strings of HTML from the data. A typical program deals with numerous tasks like these that require you to merge, split, rearrange, search, iterate over, and otherwise handle strings; and as web applications become more complex, progressively more of this processing is done in the browser.

    几乎所有JavaScript程序都与字符串操作紧密相连。例如,许多应用程序使用Ajax从服务器获取字符串,将这些字符串转换成更易用的JavaScript对象,然后从数据中生成HTML字符串。一个典型的程序需要处理若干这样的任务,合并,分解,重新排列,搜索,遍历,以及其他方法处理字符串。随着网页应用越来越复杂,越来越多的此类任务将在浏览器中完成。

 

    In JavaScript, regular expressions are essential for anything more than trivial string processing. A lot of this chapter is therefore dedicated to helping you understand how regular expression engines internally process your strings and teaching you how to write regular expressions that take advantage of this knowledge.

    在JavaScript中,正则表达式是必不可少的东西,它的重要性远超过琐碎的字符串处理。本章使用相当篇幅帮助您了解正则表达式引擎处理字符串的原理,并讲授如何利用这些知识书写正则表达式。

 

    Also in this chapter, you'll learn about the fastest cross-browser methods for concatenating and trimming strings, discover how to increase regex performance by reducing backtracking, and pick up plenty of other tips and tricks for efficiently processing strings and regular expressions.

    通过本章内容,您还将学到关于连接、修整字符串的最快的跨浏览器方法,探索如何通过减少回溯来提高正则表达式的性能,并挑选了一些关于高效处理字符串和正则表达式的技巧。

 

String Concatenation  字符串连接

 

    String concatenation can be surprisingly performance intensive. It's a common task to build a string by continually adding to the end of it in a loop (e.g., when building up an HTML table or an XML document), but this sort of processing is notorious for its poor performance in some browsers.

    字符串连接表现出惊人的性能紧张。通常一个任务通过一个循环,向字符串末尾不断地添加内容,来创建一个字符串(例如,创建一个HTML表或者一个XML文档),但此类处理在一些浏览器上表现糟糕而遭人痛恨。

 

    So how can you optimize these kinds of tasks? For starters, there is more than one way to merge strings (see Table 5-1).

    那么你怎样优化此类任务呢?首先,有多种方法可以合并字符串(见表5-1)。

 

Table 5-1. String concatenation methods

表5-1  字符串连接函数

    All of these methods are fast when concatenating a few strings here and there, so for casual use, you should go with whatever is the most practical. As the length and number of strings that must be merged increases, however, some methods start to show their strength.

    当连接少量字符串时,所有这些函数都很快,临时使用的话,可选择最熟悉的使用。当合并字符串的长度和数量增加之后,有些函数开始显示出自己的威力。

 

Plus (+) and Plus-Equals (+=) Operators  加和加等于操作

 

    These operators provide the simplest method for concatenating strings and, in fact, all modern browsers except IE7 and earlier optimize them well enough that you don't really need to look at other options. However, several techniques maximize the efficiency of these operators.

    这些操作符提供了连接字符串的最简单方法,事实上,除IE7和它之前的所有现代浏览器都对此优化得很好,所以你不需要寻找其他方法。然而,有些技术可以最大限度地提高这些操作的效率。

 

    First, an example. Here's a common way to assign a concatenated string:

    首先,看一个例子。这是连接字符串的常用方法:

 

str += "one" + "two";

 

    When evaluating this code, four steps are taken:

    此代码执行时,发生四个步骤:

 

1. A temporary string is created in memory.

   内存中创建了一个临时字符串。


2. The concatenated value "onetwo" is assigned to the temporary string.

   临时字符串的值被赋予“onetwo”。


3. The temporary string is concatenated with the current value of str.

   临时字符串与str的值进行连接。


4. The result is assigned to str.

   结果赋予str。

 

    This is actually an approximation of how browsers implement this task, but it's close.

    这基本上就是浏览器完成这一任务的过程。

 

    The following code avoids the temporary string (steps 1 and 2 in the list) by directly appending to str using two discrete statements. This ends up running about 10%–40% faster in most browsers:

    下面的代码通过两个离散表达式直接将内容附加在str上避免了临时字符串(上面列表中第1步和第2步)。在大多数浏览器上这样做可加快10%-40%:

 

str += "one";
str += "two";

 

    In fact, you can get the same performance improvement using one statement, as follows:

    实际上,你可以用一行代码就实现这样的性能提升,如下:

 

str = str + "one" + "two";
// equivalent to str = ((str + "one") + "two")

 

    This avoids the temporary string because the assignment expression starts with str as the base and appends one string to it at a time, with each intermediary concatenation performed from left to right. If the concatenation were performed in a different order (e.g., str = "one" + str + "two"), you would lose this optimization. This is because of the way that browsers allocate memory when merging strings. Apart from IE, browsers try to expand the memory allocation for the string on the left of an expression and simply copy the second string to the end of it (see Figure 5-1). If, in a loop, the base string is furthest to the left, you avoid repeatedly copying a progressively larger base string.

    这就避免了使用临时字符串,因为赋值表达式开头以str为基础,一次追加一个字符串,从左至右依次连接。如果改变连接顺序(例如,str = "one" + str + "two"),你会失去这种优化。这与浏览器合并字符串时分配内存的方法有关。除IE以外,浏览器尝试扩展表达式左端字符串的内存,然后简单地将第二个字符串拷贝到它的尾部(如图5-1)。如果在一个循环中,基本字符串位于最左端,就可以避免多次复制一个越来越大的基本字符串。

Figure 5-1. Example of memory use when concatenating strings: s1 is copied to the end of s2 to create s3; the base string s2 is not copied

图5-1  连接字符串时的内存使用情况:s1复制到s2的尾部形成s3;基本字符串s2没有被复制

 

    These techniques don't apply to IE. They have little, if any, effect in IE8 and can actually make things slower in IE7 and earlier. That's because of how IE executes concatenation under the hood. In IE8's implementation, concatenating strings merely stores references to the existing string parts that compose the new string. At the last possible moment (when you actually use the concatenated string), the string parts are each copied into a new "real" string, which then replaces the previously stored string references so that this assembly doesn't have to be performed every time the string is used.

    这些技术并不适用于IE。它们几乎没有任何作用,在IE8上甚至比IE7和早期版本更慢。这与IE执行连接操作的机制有关。在IE8中,连接字符串只是记录下构成新字符串的各部分字符串的引用。在最后时刻(当你真正使用连接后的字符串时),各部分字符串才被逐个拷贝到一个新的“真正的”字符串中,然后用它取代先前的字符串引用,所以并非每次使用字符串时都发生合并操作。

 

    IE7 and earlier use an inferior implementation of concatenation in which each pair of concatenated strings must always be copied to a new memory location. You'll see the potentially dramatic impact of this in the upcoming section "Array Joining". With the pre-IE8 implementation, the advice in this section can make things slower since it's faster to concatenate short strings before merging them with a larger base string (thereby avoiding the need to copy the larger string multiple times). For instance, with largeStr = largeStr + s1 + s2, IE7 and earlier must copy the large string twice, first to merge it with s1, then with s2. Conversely, largeStr += s1 + s2 first merges the two smaller strings and then concatenates the result with the large string. Creating the intermediary string of s1 + s2 is a much lighter performance hit than copying the large string twice.

    IE7和更早的浏览器在连接字符串时使用更糟糕的实现方法,每连接一对字符串都要把它们复制到一块新分配的内存中。你会在后面的“数组联结”一节中看到它潜在的巨大影响。针对IE8之前的实现方式,本节的建议反而会使代码更慢,因为合并多个短字符串比连接一个大字符串更快(避免多次拷贝那些大字符串)。例如,largeStr = largeStr + s1 + s2语句,在IE7和更早的版本中,必须将这个大字符串拷贝两次,首先与s1合并,然后再与s2合并。相反,largeStr = s1 + s2首先将两个小字符串合并起来,然后将结果返回给大字符串。创建中间字符串s1 + s2与两次拷贝大字符串相比,性能冲击要轻得多。

 

Firefox and compile-time folding  Firefox和编译期合并

 

    When all strings concatenated in an assignment expression are compile-time constants, Firefox automatically merges them at compile time. Here's a way to see this in action:

    在赋值表达式中所有字符串连接都属于编译期常量,Firefox自动地在编译过程中合并它们。这里有一个方法可看到这一过程:

 

function foldingDemo() {
  var str = "compile" + "time" + "folding";
  str += "this" + "works" + "too";
  str = str + "but" + "not" + "this";
}
alert(foldingDemo.toString());

// In Firefox, you'll see this:
// function foldingDemo() {
//   var str = "compiletimefolding";
//   str += "thisworkstoo";
//   str = str + "but" + "not" + "this";
// }

 

    When strings are folded together like this, there are no intermediary strings at runtime and the time and memory that would be spent concatenating them is reduced to zero. This is great when it occurs, but it doesn't help very often because it's much more common to build strings from runtime data than from compile-time constants.

    当字符串是这样合并在一起时,由于运行时没有中间字符串,所以连接它们的时间和内存可以减少到零。这种功能非常了不起,但它并不经常起作用,因为通常从运行期数据创建字符串而不是从编译期常量。

 

Array Joining  数组联结

 

    The Array.prototype.join method merges all elements of an array into a string and accepts a separator string to insert between each element. By passing in an empty string as the separator, you can perform a simple concatenation of all elements in an array.

    Array.prototype.join方法将数组的所有元素合并成一个字符串,并在每个元素之间插入一个分隔符字符串。如果传递一个空字符串作为分隔符,你可以简单地将数组的所有元素连接起来。

 

    Array joining is slower than other methods of concatenation in most browsers, but this is more than compensated for by the fact that it is the only efficient way to concatenate lots of strings in IE7 and earlier.

    在大多数浏览器上,数组联结比连接字符串的其他方法更慢,但是事实上,为一种补偿方法,在IE7和更早的浏览器上它是连接大量字符串唯一高效的途径。

 

    The following example code demonstrates the kind of performance problem that array joining solves:

    下面的示例代码演示了可用数组联结解决的性能问题:

 

var str = "I'm a thirty-five character string.",
newStr = "",
appends = 5000;
while (appends--) {
  newStr += str;
}

    This code concatenates 5,000 35-character strings. Figure 5-2 shows how long it takes to complete this test in IE7, starting with 5,000 concatenations and then gradually increasing that number.

    此代码连接5'000个长度为35的字符串。图5-2显示出在IE7中执行此测试所需的时间,从5'000次连接开始,然后逐步增加连接数量。

Figure 5-2. Time to concatenate strings using += in IE7

图5-2  IE7中使用+=连接字符串所用的时间

    IE7's naive concatenation algorithm requires that the browser repeatedly copy and allocate memory for larger and larger strings each time through the loop. The result is quadratic running time and memory consumption.

    IE7天真的连接算法要求浏览器在循环过程中反复地为越来越大的字符串拷贝和分配内存。结果是以平方关系递增的运行时间和内存消耗。

 

    The good news is that all other modern browsers (including IE8) perform far better in this test and do not exhibit the quadratic complexity that is the real killer here. However, this demonstrates the impact that seemingly simple string concatenation can have; 226 milliseconds for 5,000 concatenations is already a significant performance hit that would be nice to reduce as much as possible, but locking up a user's browser for more than 32 seconds in order to concatenate 20,000 short strings is unacceptable for nearly any application.

    好消息是所有其他的现代浏览器(包括IE8)在这个测试中表现良好,不会呈现平方关系的复杂性递增,这是真正的杀手级改善。然而,此程序演示了看似简单的字符串连接所产生的影响。5'000次连接用去226毫秒已经是一个显著的性能冲击了,应当尽可能地缩减这一时间,但锁定用户浏览器长达32秒,只是为了连接20'000个短字符串,则对任何应用程序来说都是不能接受的。

 

    Now consider the following test, which generates the same string via array joining:

    现在考虑下面的测试,它使用数组联结生成同样的字符串:

 

var str = "I'm a thirty-five character string.",
strs = [],
newStr,
appends = 5000;
while (appends--) {
  strs[strs.length] = str;
}
newStr = strs.join("");

 

    Figure 5-3 shows this test's running time in IE7.

    图5-3显示出IE7上进行此测试所用的时间。

Figure 5-3. Time to concatenate strings using array joining in IE7

图5-3  IE7中使用数组连接来连接字符串所用的时间

 

    This dramatic improvement results from avoiding repeatedly allocating memory for and copying progressively larger and larger strings. When joining an array, the browser allocates enough memory to hold the complete string, and never copies the same part of the final string more than once.

    这一难以置信的改进结果是因为避免了重复的内存分配和拷贝越来越大的字符串。当联结一个数组时,浏览器分配足够大的内存用于存放整个字符串,也不会超过一次地拷贝最终字符串的同一部分。

 

String.prototype.concat

 

    The native string concat method accepts any number of arguments and appends each to the string that the method is called on. This is the most flexible way to concatenate strings because you can use it to append just one string, a few strings at a time, or an entire array of strings.

    原生字符串连接函数接受任意数目的参数,并将每一个参数都追加在调用函数的字符串上。这是连接字符串最灵活的方法,因为你可以用它追加一个字符串,或者一次追加几个字符串,或者一个完整的字符串数组。

 

// append one string
str = str.concat(s1);
// append three strings
str = str.concat(s1, s2, s3);
// append every string in an array by using the array
// as the list of arguments

str = String.prototype.concat.apply(str, array);

    Unfortunately, concat is a little slower than simple + and += operators in most cases, and can be substantially slower in IE, Opera, and Chrome. Moreover, although using concat to merge all strings in an array appears similar to the array joining approach discussed previously, it's usually slower (except in Opera), and it suffers from the same potentially catastrophic performance problem as + and += when building large strings in IE7 and earlier.

    不幸的是,大多数情况下concat比简单的+和+=慢一些,而且在IE,Opera和Chrome上大幅变慢。此外,虽然使用concat合并数组中的所有字符串看起来和前面讨论的数组联结差不多,但通常它更慢一些(Opera除外),而且它还潜伏着灾难性的性能问题,正如IE7和更早版本中使用+和+=创建大字符串那样。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值