nginx 正则匹配优化(一)
背景
在 IPV6 改造方案中使用了大量正则匹配进行域名改写,使用perf 进行分析,pcre_exec 是主要热点。
如何优化?
- 减少输入规模
- 使用pcre_jit hyperscan 等,针对库做优化(新版本的库或者其他更好的库)
- 计算缓存 缓存计算结果可以 针对结果的内容缓存 以及 针对输入的 中间态 匹配缓存
针对本次业务,主要选择改动较小的pcre jit 进行优化,对库的优化 也要注意版本的更新情况。
测试结果
经过系列调优(开启 pcre_jit 、更新pcre 到最新库)后,CPU 使用率 优化效果约一倍
perf pcre_exec 热点函数从13% 降低到约3%
线上空闲率提升较为明显
测试记录:使用 foreach -c 2000 -w 1 “curl -voa http://aaa.aa:82/cnr.html -x 127.1:81” 请求152KB 的html文件进行改写
-
系统默认pcre-8.32 nginx 编译不带 --with-pcre-jit
配置pcre_jit on cpu 65%PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22335 root 20 0 220076 101116 1704 R 65.5 2.8 0:20.62 nginx配置pcre_jit off cpu 81.7%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29177 root 20 0 219496 100832 1700 R 81.7 2.8 0:36.09 nginx -
使用源码pcre-8.32 nginx 编译带 --with-pcre-jit
配置pcre_jit on cpu 65%PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22335 root 20 0 220076 101116 1704 R 65.5 2.8 0:20.62 nginx配置pcre_jit off cpu 76.7%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29177 root 20 0 219496 100832 1700 R 76.7 2.8 0:36.09 nginx -
使用源码pcre-8.42 nginx 编译带 --with-pcre-jit
配置pcre_jit on cpu 58%PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22335 root 20 0 220076 101116 1704 R 58.5 2.8 0:20.62 nginx配置pcre_jit off cpu 66%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29177 root 20 0 219496 100832 1700 R 66.7 2.8 0:36.09 nginx
原因分析
到底何为 JIT 以及 pcre 源码中的 pcre-sljit
First, a regular expression is compiled into an internal representation by pcre_compile(). The internal representation (usually called MIR - Middle Level Representation) is a sequence of byte-codes. Each byte-code is basically a command, which tells the next step for the interpreter inside pcre_exec(). The JIT compiler translates MIR to machine code when the appropriate flags are passed to pcre_study(). The returned pcre_extra data contains a pointer to a machine executable function if the machine code generation was successful.
JIT compilers totally eliminate the continual reparsing of MIR. Even if the MIR code is much simpler than the original pattern string, the execution engine is full of ifs and switches, and executing them consumes considerable time. The compiled machine code only contains those machine instructions whose are absolutely necessary for this particular pattern, and nothing more.
PCRE Performance Project https://zherczeg.github.io/sljit/pcre.html
深入浅出JIT 编译器 https://www.ibm.com/developerworks/cn/java/j-lo-just-in-time/index.html
JIT 为什么能够大幅提升性能 https://www.zhihu.com/question/19672491 https://segmentfault.com/q/1010000000366720
hyperscan -Why and How to Replace Perl Compatible Regular Expressions (PCRE) with Hyperscan
https://software.intel.com/en-us/articles/why-and-how-to-replace-pcre-with-hyperscan
为何新版本性能更高,待后续研究 PCRE 版本change-log
https://abi-laboratory.pro/?view=changelog&l=pcre&v=8.42
sljit
https://zherczeg.github.io/sljit/
OPEN INFORMATION SECURITY FOUNDATION
https://oisf.net/
值得研究的软件
Suricata ebpf hyperscan
https://suricata-ids.org/news/