[翻译]High Performance JavaScript(015)

第五章  Strings and Regular Expressions  字符串和正则表达式


    Practically all JavaScript programs are intimately tied to strings. For example, many applications use Ajax to fetch strings from a server, convert those strings into more easily usable JavaScript objects, and then generate strings of HTML from the data. A typical program deals with numerous tasks like these that require you to merge, split, rearrange, search, iterate over, and otherwise handle strings; and as web applications become more complex, progressively more of this processing is done in the browser.



    In JavaScript, regular expressions are essential for anything more than trivial string processing. A lot of this chapter is therefore dedicated to helping you understand how regular expression engines internally process your strings and teaching you how to write regular expressions that take advantage of this knowledge.



    Also in this chapter, you'll learn about the fastest cross-browser methods for concatenating and trimming strings, discover how to increase regex performance by reducing backtracking, and pick up plenty of other tips and tricks for efficiently processing strings and regular expressions.



String Concatenation  字符串连接


    String concatenation can be surprisingly performance intensive. It's a common task to build a string by continually adding to the end of it in a loop (e.g., when building up an HTML table or an XML document), but this sort of processing is notorious for its poor performance in some browsers.



    So how can you optimize these kinds of tasks? For starters, there is more than one way to merge strings (see Table 5-1).



Table 5-1. String concatenation methods

表5-1  字符串连接函数

    All of these methods are fast when concatenating a few strings here and there, so for casual use, you should go with whatever is the most practical. As the length and number of strings that must be merged increases, however, some methods start to show their strength.



Plus (+) and Plus-Equals (+=) Operators  加和加等于操作


    These operators provide the simplest method for concatenating strings and, in fact, all modern browsers except IE7 and earlier optimize them well enough that you don't really need to look at other options. However, several techniques maximize the efficiency of these operators.



    First, an example. Here's a common way to assign a concatenated string:



str += "one" + "two";


    When evaluating this code, four steps are taken:



1. A temporary string is created in memory.


2. The concatenated value "onetwo" is assigned to the temporary string.


3. The temporary string is concatenated with the current value of str.


4. The result is assigned to str.



    This is actually an approximation of how browsers implement this task, but it's close.



    The following code avoids the temporary string (steps 1 and 2 in the list) by directly appending to str using two discrete statements. This ends up running about 10%–40% faster in most browsers:



str += "one";
str += "two";


    In fact, you can get the same performance improvement using one statement, as follows:



str = str + "one" + "two";
// equivalent to str = ((str + "one") + "two")


    This avoids the temporary string because the assignment expression starts with str as the base and appends one string to it at a time, with each intermediary concatenation performed from left to right. If the concatenation were performed in a different order (e.g., str = "one" + str + "two"), you would lose this optimization. This is because of the way that browsers allocate memory when merging strings. Apart from IE, browsers try to expand the memory allocation for the string on the left of an expression and simply copy the second string to the end of it (see Figure 5-1). If, in a loop, the base string is furthest to the left, you avoid repeatedly copying a progressively larger base string.

    这就避免了使用临时字符串,因为赋值表达式开头以str为基础,一次追加一个字符串,从左至右依次连接。如果改变连接顺序(例如,str = "one" + str + "two"),你会失去这种优化。这与浏览器合并字符串时分配内存的方法有关。除IE以外,浏览器尝试扩展表达式左端字符串的内存,然后简单地将第二个字符串拷贝到它的尾部(如图5-1)。如果在一个循环中,基本字符串位于最左端,就可以避免多次复制一个越来越大的基本字符串。

Figure 5-1. Example of memory use when concatenating strings: s1 is copied to the end of s2 to create s3; the base string s2 is not copied

图5-1  连接字符串时的内存使用情况:s1复制到s2的尾部形成s3;基本字符串s2没有被复制


    These techniques don't apply to IE. They have little, if any, effect in IE8 and can actually make things slower in IE7 and earlier. That's because of how IE executes concatenation under the hood. In IE8's implementation, concatenating strings merely stores references to the existing string parts that compose the new string. At the last possible moment (when you actually use the concatenated string), the string parts are each copied into a new "real" string, which then replaces the previously stored string references so that this assembly doesn't have to be performed every time the string is used.



    IE7 and earlier use an inferior implementation of concatenation in which each pair of concatenated strings must always be copied to a new memory location. You'll see the potentially dramatic impact of this in the upcoming section "Array Joining". With the pre-IE8 implementation, the advice in this section can make things slower since it's faster to concatenate short strings before merging them with a larger base string (thereby avoiding the need to copy the larger string multiple times). For instance, with largeStr = largeStr + s1 + s2, IE7 and earlier must copy the large string twice, first to merge it with s1, then with s2. Conversely, largeStr += s1 + s2 first merges the two smaller strings and then concatenates the result with the large string. Creating the intermediary string of s1 + s2 is a much lighter performance hit than copying the large string twice.

    IE7和更早的浏览器在连接字符串时使用更糟糕的实现方法,每连接一对字符串都要把它们复制到一块新分配的内存中。你会在后面的“数组联结”一节中看到它潜在的巨大影响。针对IE8之前的实现方式,本节的建议反而会使代码更慢,因为合并多个短字符串比连接一个大字符串更快(避免多次拷贝那些大字符串)。例如,largeStr = largeStr + s1 + s2语句,在IE7和更早的版本中,必须将这个大字符串拷贝两次,首先与s1合并,然后再与s2合并。相反,largeStr = s1 + s2首先将两个小字符串合并起来,然后将结果返回给大字符串。创建中间字符串s1 + s2与两次拷贝大字符串相比,性能冲击要轻得多。


Firefox and compile-time folding  Firefox和编译期合并


    When all strings concatenated in an assignment expression are compile-time constants, Firefox automatically merges them at compile time. Here's a way to see this in action:



function foldingDemo() {
  var str = "compile" + "time" + "folding";
  str += "this" + "works" + "too";
  str = str + "but" + "not" + "this";

// In Firefox, you'll see this:
// function foldingDemo() {
//   var str = "compiletimefolding";
//   str += "thisworkstoo";
//   str = str + "but" + "not" + "this";
// }


    When strings are folded together like this, there are no intermediary strings at runtime and the time and memory that would be spent concatenating them is reduced to zero. This is great when it occurs, but it doesn't help very often because it's much more common to build strings from runtime data than from compile-time constants.



Array Joining  数组联结


    The Array.prototype.join method merges all elements of an array into a string and accepts a separator string to insert between each element. By passing in an empty string as the separator, you can perform a simple concatenation of all elements in an array.



    Array joining is slower than other methods of concatenation in most browsers, but this is more than compensated for by the fact that it is the only efficient way to concatenate lots of strings in IE7 and earlier.



    The following example code demonstrates the kind of performance problem that array joining solves:



var str = "I'm a thirty-five character string.",
newStr = "",
appends = 5000;
while (appends--) {
  newStr += str;

    This code concatenates 5,000 35-character strings. Figure 5-2 shows how long it takes to complete this test in IE7, starting with 5,000 concatenations and then gradually increasing that number.


Figure 5-2. Time to concatenate strings using += in IE7

图5-2  IE7中使用+=连接字符串所用的时间

    IE7's naive concatenation algorithm requires that the browser repeatedly copy and allocate memory for larger and larger strings each time through the loop. The result is quadratic running time and memory consumption.



    The good news is that all other modern browsers (including IE8) perform far better in this test and do not exhibit the quadratic complexity that is the real killer here. However, this demonstrates the impact that seemingly simple string concatenation can have; 226 milliseconds for 5,000 concatenations is already a significant performance hit that would be nice to reduce as much as possible, but locking up a user's browser for more than 32 seconds in order to concatenate 20,000 short strings is unacceptable for nearly any application.



    Now consider the following test, which generates the same string via array joining:



var str = "I'm a thirty-five character string.",
strs = [],
appends = 5000;
while (appends--) {
  strs[strs.length] = str;
newStr = strs.join("");


    Figure 5-3 shows this test's running time in IE7.


Figure 5-3. Time to concatenate strings using array joining in IE7

图5-3  IE7中使用数组连接来连接字符串所用的时间


    This dramatic improvement results from avoiding repeatedly allocating memory for and copying progressively larger and larger strings. When joining an array, the browser allocates enough memory to hold the complete string, and never copies the same part of the final string more than once.





    The native string concat method accepts any number of arguments and appends each to the string that the method is called on. This is the most flexible way to concatenate strings because you can use it to append just one string, a few strings at a time, or an entire array of strings.



// append one string
str = str.concat(s1);
// append three strings
str = str.concat(s1, s2, s3);
// append every string in an array by using the array
// as the list of arguments

str = String.prototype.concat.apply(str, array);

    Unfortunately, concat is a little slower than simple + and += operators in most cases, and can be substantially slower in IE, Opera, and Chrome. Moreover, although using concat to merge all strings in an array appears similar to the array joining approach discussed previously, it's usually slower (except in Opera), and it suffers from the same potentially catastrophic performance problem as + and += when building large strings in IE7 and earlier.


  • 0
  • 1
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


