上一篇说到pcre/pcre2的修复,这一篇主要是说到TRE的修改,其实这个库本身fbc编译器也在使用
官方能用,说明这个库的性能应该会不错。
然而,经过我测试,我发现这完全是低估了这个库,这个库不仅是不错,而是飞快,快到让人难以想象
一段455k的字符串的正则替换一次,花费的时间为26毫秒,而CRegExp(vbscript)则需要60毫秒
奇怪的是,VisualFreeBasic只是带了这个库的头文件,却没有带库文件。
fbc/tre.bi at master · freebasic/fbc · GitHub
在VisualFreeBasic的Compile\FreeBASIC-1.09.0-winlibs-gcc-9.3.0\inc\tre目录下有头文件
官方推荐的引入方式是#include once "regex.bi"
其实就是引入VisualFreeBasic的Compile\FreeBASIC-1.09.0-winlibs-gcc-9.3.0\inc\regex.bi文件
看这意思是
我一顿好搜索,才找到下面的链接
https://users.freebasic-portal.de/stw/files/prog/fb/libs/libtre-080.zip
解压之后,把win32和win64的tre.a拷贝到VisualFreeBasic的Compile\FreeBASIC-1.09.0-winlibs-gcc-9.3.0\lib\对应的目录下面即可。
创建一个TestTRE的标准exe项目,在头文件加入(总之头文件,都在这个文件夹下面加入,避免奇葩的编译问题)
#include "regex.bi"
#ifndef regexmatch
#define regexmatch(match,zeile,n) mid(zeile,1+match(n).rm_so, match(n).rm_eo-match(n).rm_so)
#endif
在Form1代码中加入
Sub Form1_Shown(hWndForm As hWnd ,UserData As Integer)
Dim strIn As String = GetFileStr("mbedtls.bi")
Dim Start As Double = Timer()
strIn = regex_replace("private function(.*?)end function\r\n" ,"" ,strIn)
strIn = regex_replace("private sub(.*?)end sub\r\n" ,"" ,strIn)
Print "正则花费时间" ,Timer() - Start
SaveFileStr("mbedtls_1.bi" ,strIn)
End Sub
Function regex_replace(ByRef regex As String, ByRef replace_pattern As String, ByRef subject As String) As String
Dim replaced As String, rest As String
rest=subject
Dim re As regex_t
If regcomp( @re, regex, REG_EXTENDED Or REG_ICASE )<>0 Then Return ""
Dim match(re.re_nsub) As regmatch_t, n As Integer
While regexec( @re, StrPtr(rest), re.re_nsub+1, @match(0), 0 )=0
replaced+=Left(rest,match(0).rm_so)
For n = 1 To Len(replace_pattern)
If Mid(replace_pattern,n,1) = "" And _
Mid(replace_pattern,n-1,1)<>"\" And _
Val(Mid(replace_pattern,n+1,1)) > 0 And _
Val(Mid(replace_pattern,n+1,1)) <= re.re_nsub _
Then
replaced+=regexmatch(match,rest,Val(Mid(replace_pattern,n+1,1)))
n+=1
Else
replaced+=Mid(replace_pattern,n,1)
End If
Next n
If match(0).rm_eo=Len(rest) Then Return replaced
rest=Mid(rest,match(0).rm_eo+1)
Wend
Return replaced+rest
End Function
Sub printmatches( ByVal PATTERN As String, ByVal buffer As String )
Dim re As regex_t
Dim pm As regmatch_t
Dim pbuff As ZString Ptr
Dim res As Integer
pbuff = StrPtr( buffer )
'' compile the pattern
regcomp( @re, PATTERN, 0 )
'' first match
res = regexec( @re, pbuff, 1, @pm, 0 )
Do While( res = 0 )
Print "<"; Mid( *pbuff, 1 + pm.rm_so, pm.rm_eo - pm.rm_so ); ">"
'' next match
pbuff += pm.rm_eo
res = regexec( @re, pbuff, 1, @pm, REG_NOTBOL )
Loop
'' free the context
regfree( @re )
End Sub
做了两次455k字符串的替换,一共花了59毫秒的时间。关键是程序体积一共增加了65K。
综上所述VisualFreeBasic中使用正则的最优的解决方案是TRE。