ANTLR4规则解析生成器(五)：错误处理

luofengmacheng

于 2024-03-11 15:18:04 发布

阅读量1.4k

点赞数 23

分类专栏：安全文章标签： python 开发语言 antlr4

本文链接：https://blog.csdn.net/luofengmacheng/article/details/136624935

版权

安全专栏收录该内容

19 篇文章 1 订阅

订阅专栏

本文介绍了如何在ANTLR4编写的程序中处理用户输入错误，通过创建自定义ErrorListener来提供更详细的错误报告，包括错误位置和消息。默认的ConsoleErrorListener可以被替换，以增强错误信息的可读性和上下文信息。

摘要由CSDN通过智能技术生成

文章目录

1 背景

当使用基于ANTLR4开发的程序处理用户的输入时，可能用户的输入不符合预期的，此时，用户需要知道哪里出现了问题，以便进行修改，因此，在开发基于ANTLR4的程序时，需要将错误也考虑进去，尽量输出用户可以看得明白的错误信息。

2 ErrorListener

语法解析器的类派生关系是：UserParser<-antlr4.Parser<-antlr4.Recognizer<-object，因此，用户的Parser也可以调用Recognizer中的方法。

当语法解析器在分析输入串出现错误时，会调用antlr4.Recognizer中的getErrorListenerDispatch()函数将错误信息分发出去，该函数返回ProxyErrorListener对象：

class ProxyErrorListener(ErrorListener):

    def __init__(self, delegates):
        super().__init__()
        if delegates is None:
            raise ReferenceError("delegates")
        self.delegates = delegates

    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        for delegate in self.delegates:
            delegate.syntaxError(recognizer, offendingSymbol, line, column, msg, e)

    def reportAmbiguity(self, recognizer, dfa, startIndex, stopIndex, exact, ambigAlts, configs):
        for delegate in self.delegates:
            delegate.reportAmbiguity(recognizer, dfa, startIndex, stopIndex, exact, ambigAlts, configs)

    def reportAttemptingFullContext(self, recognizer, dfa, startIndex, stopIndex, conflictingAlts, configs):
        for delegate in self.delegates:
            delegate.reportAttemptingFullContext(recognizer, dfa, startIndex, stopIndex, conflictingAlts, configs)

    def reportContextSensitivity(self, recognizer, dfa, startIndex, stopIndex, prediction, configs):
        for delegate in self.delegates:
            delegate.reportContextSensitivity(recognizer, dfa, startIndex, stopIndex, prediction, configs)

ProxyErrorListener有4个方法，分别对应不同的错误类型：

syntaxError：语法错误
reportAmbiguity：歧义上报
reportAttemptingFullContext：当从SLL模式切换到LL模式时上报
reportContextSensitivity：与reportAttemptingFullContext类似，解析过程中需要完整的上下文才能继续解析

通常来说，下面的三个report函数是在开发过程中，由于规则不完善导致的，而syntaxError则是输入串出现不符合规则时触发的错误。

可以看到，ProxyErrorListener的syntaxError()方法只是遍历构造函数中的对象，调用每个对象的syntaxError()方法，构造函数中的对象来自于Recognizer._listeners，Recognizer._listeners是个数组，默认的成员是ConsoleErrorListener.INSTANCE：ConsoleErrorListener.INSTANCE = ConsoleErrorListener()，因此，默认情况下，调用的就是ConsoleErrorListener的syntaxError方法，也就是输出一行错误信息到标准出错：

class ConsoleErrorListener(ErrorListener):
    INSTANCE = None

    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        print("line " + str(line) + ":" + str(column) + " " + msg, file=sys.stderr)

ConsoleErrorListener.INSTANCE = ConsoleErrorListener()

因此，对于默认情况下，如果输入串不符合规则，会打印类似line 1:10 missing ')' at '\n'的错误信息。

那么，有办法自定义错误信息吗？或者在错误信息中加入更多的上下问题吗？方法就是替换掉默认的ErrorListener。

首先定义一个我们自己的ErrorListener：

class exprException(ErrorListener):
    def syntaxError(
        self,
        recognizer: Recognizer,
        offendingSymbol: CommonToken,
        line: int,
        column: int,
        msg: str,
        e: RecognitionException,
    ):
        print("Something is wrong at {}(line):{}(column), errmsg={}".format(line, column, msg))

该ErrorListener只有一个syntaxError方法，该方法可以接收antlr4传过来的一些错误数据让我们自由拼接：

recognizer：识别器，当前的解析器示例，其实就是我们的Parser对象
offendingSymbol：触发错误的符号，包含错误的文本和位置
line：错误的行号
column：错误的列号
msg：错误信息
e：异常对象

不过根据上面这些字段看，最重要的也还是三个：line、column、msg，这三个就可以告诉用户，在哪里发生了什么样的错误，因此，上面的自定义ErrorListener还是基于这三个字段生成了错误信息，也可以在这里抛出异常，然后在调用解析器处理输入串时处理该异常。

当创建出我们自己的ErrorListener后，就需要将这个ErrorListener放到Recognizer._listeners数组中：

parser = exprParser(token_stream)
parser.removeErrorListeners()
parser.addErrorListener(exprException())

先调用Recognizer.removeErrorListeners()清空_listeners数组，然后将我们自己的ErrorListener放到_listeners数组中。

注意：虽然名称是ErrorListener，但是，在Listener和Visitor中都可以使用。

3 总结

antlr4在进行语法分析时，如果发现有不符合语法规则的语句时，默认情况下是会打印一行错误信息，通过创建基于ErrorListener的类，可以对错误信息自定义，使得错误信息的可读性更好。

luofengmacheng

关注

23
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录