java 正则表达式 预编译_正则表达式中的编译和重复使用

正则表达式中的编译和重复使用Compilation and Reuse in Regular Expressions

03/30/2017

本文内容

通过了解正则表达式引擎编译表达式的方式以及正则表达式的缓存方式,可以优化大量使用正则表达式的应用程序的性能。You can optimize the performance of applications that make extensive use of regular expressions by understanding how the regular expression engine compiles expressions and by understanding how regular expressions are cached. 本主题介绍编译和缓存。This topic discusses both compilation and caching.

已编译的正则表达式Compiled Regular Expressions

默认情况下,正则表达式引擎将正则表达式编译成内部指令序列(这些指令序列是不同于 Microsoft 中间语言 (MSIL) 的高级代码)。By default, the regular expression engine compiles a regular expression to a sequence of internal instructions (these are high-level codes that are different from Microsoft intermediate language, or MSIL). 当引擎执行正则表达式时,会解释内部代码。When the engine executes a regular expression, it interprets the internal codes.

如果 Regex 对象是通过 RegexOptions.Compiled 选项构造而成,它会将正则表达式编译为显式 MSIL 代码,而不是高级正则表达式内部指令。If a Regex object is constructed with the RegexOptions.Compiled option, it compiles the regular expression to explicit MSIL code instead of high-level regular expression internal instructions. 这样,.NET 的实时 (JIT) 编译器便可以将表达式转换为本机代码以获得更高的性能。This allows .NET's just-in-time (JIT) compiler to convert the expression to native machine code for higher performance. 构造 Regex 对象的成本可能会更高,但执行其匹配项的开销可能会小得多。The cost of constructing the Regex object may be higher, but the cost of performing matches with it is likely to be much smaller.

替换方法是,使用预编译正则表达式。An alternative is to use precompiled regular expressions. 可以使用 CompileToAssembly 方法,将所有表达式都编译到可重用的 DLL 中。You can compile all of your expressions into a reusable DLL by using the CompileToAssembly method. 这样一来,就无需在运行时编译,同时还仍受益于已编译正则表达式的速度优势。This avoids the need to compile at run time while still benefiting from the speed of compiled regular expressions.

正则表达式缓存The Regular Expressions Cache

为了提高性能,正则表达式引擎为已编译的正则表达式维护了一个应用程序范围的缓存。To improve performance, the regular expression engine maintains an application-wide cache of compiled regular expressions. 该缓存只存储静态方法调用中使用的正则表达式模式。The cache stores regular expression patterns that are used only in static method calls. (不缓存提供给实例方法的正则表达式模式。)这样,在每次使用正则表达式时,就无需将正则表达式重新分析成高级字节代码。(Regular expression patterns supplied to instance methods are not cached.) This avoids the need to reparse an expression into high-level byte code each time it is used.

缓存正则表达式数上限由 static(Visual Basic 中的 Shared)Regex.CacheSize 属性的值决定。The maximum number of cached regular expressions is determined by the value of the static (Shared in Visual Basic) Regex.CacheSize property. 默认情况下,正则表达式引擎最多可缓存 15 个已编译的正则表达式。By default, the regular expression engine caches up to 15 compiled regular expressions. 如果已编译正则表达式的数目超过缓存大小,则丢弃最早使用的正则表达式并缓存新的正则表达式。If the number of compiled regular expressions exceeds the cache size, the least recently used regular expression is discarded and the new regular expression is cached.

应用程序可通过以下两种方式之一来重用正则表达式:Your application can reuse regular expressions in one of the following two ways:

使用 Regex 对象的静态方法定义正则表达式。By using a static method of the Regex object to define the regular expression. 如果要使用的正则表达式模式已由其他静态方法调用定义,则正则表达式引擎将尝试从缓存中检索该模式。If you're using a regular expression pattern that has already been defined by another static method call, the regular expression engine will try to retrieve it from the cache. 如果它在缓存中不可用,则引擎将编译正则表达式并将其添加到缓存中。If it's not available in the cache, the engine will compile the regular expression and add it to the cache.

重用现有 Regex 对象(只要需要使用正则表达式模式)。By reusing an existing Regex object as long as its regular expression pattern is needed.

鉴于对象实例化和正则表达式编译产生的开销,因此创建并迅速销毁大量 Regex 对象的进程成本非常高。Because of the overhead of object instantiation and regular expression compilation, creating and rapidly destroying numerous Regex objects is a very expensive process. 对于使用大量不同正则表达式的应用,可以调用静态方法 Regex,并尽量增加正则表达式缓存大小,从而优化性能。For applications that use a large number of different regular expressions, you can optimize performance by using calls to static Regex methods and possibly by increasing the size of the regular expression cache.

请参阅See also

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值