最近,碰到一个文本处理问题,第一时间想到了正则表达式,但是发现正则匹配大大降低处理效率,于是就查询到了hyperscan。果然使用hyperscan匹配速度有了很大的提升。使用hyperscan分两步:1.编译正则表达式为库。2.读取正则库进行正则匹配。
废话不多说上代码:
hyperscan编译正则库:
public class CompileDb {
public static void compile() throws IOException, CompileErrorException {
LinkedList<Expression> expressions = new LinkedList<Expression>();
//添加规则
expressions.add(new Expression("(&mac=).*(&)", EnumSet.of(ExpressionFlag.SOM_LEFTMOST,ExpressionFlag.CASELESS)));
try(Database db = Database.compile(expressions)) {
try(OutputStream out = new FileOutputStream("./db_mac")) {
db.save(out);
}
}
}
public static void main(String[] args) {
try {
compile();
} catch (IOException e) {
e.printStackTrace();
} catch (CompileErrorException e) {
e.printStackTrace();
}
}
}
hyperscan匹配:
public class Mach_hp {
public static void mach() throws IOException, CompileErrorException {
InputStream in = new FileInputStream("./db_mac");
Database loadedDb = Database.load(in);
//加载规则完毕
try(Scanner scanner = new Scanner())
{
scanner.allocScratch(loadedDb);
System.out.println("开始匹配");
//需要过滤的文本是 "mac=1283se31&"
List<Match> matches = scanner.scan(loadedDb, "mac=1283se31&");
System.out.println(matches.size());
System.out.println("匹配结果:"+matches.get(0).getMatchedString());
}
}
public static void main(String[] args) {
try {
mach();
} catch (IOException e) {
e.printStackTrace();
} catch (CompileErrorException e) {
e.printStackTrace();
}
}
}
完整代码下载地址:https://download.csdn.net/download/qq_38259063/18433138?spm=1001.2014.3001.5501