基于Predictive Parsing的ABNF语法分析器(十三)——rulelist、rule、rulename、define-as和elements

本文介绍了基于Predictive Parsing的ABNF语法分析器,重点讲解了rulelist、rule、rulename、define-as和elements的解析过程。规则列表由多个规则组成,每个规则包含规则名、定义方式和元素。文章最后提到,文法解析器的实现已基本完成,并预告将以SIP协议为例展示其应用。
摘要由CSDN通过智能技术生成

我们来看看rulelist,它是整个ABNF文法的入口,就是说一个ABNF文法就是一个规则列表rulelist。一个rulelist由若干个rule规则组成,每个rule由规则名rulename、定义方式define-as和元素elements构成。

先来看解析代码:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class AbnfParser {
//	   rulelist       =  1*( rule / (*c-wsp c-nl) )
	protected List<Rule> rulelist() throws IOException, MatchException, CollisionException {
        Map<RuleName, Rule> ruleMap = new HashMap<RuleName, Rule>();
        List<Rule> ruleList = new ArrayList<Rule>();
//      如果前向字符是字母、空格、分号、回车,则认为是rule、c-wsp或者c-nl
        while (match(is.peek(), 0x41, 0x5A) || match(is.peek(), 0x61, 0x7A) || match(is.peek(), 0x20) || match(is.peek(), ';') || match(is.peek(), 0x0D)) {
//          如果是字母开头,则认为是rule,否则是c-wsp或者c-nl
            if (match(is.peek(), 0x41, 0x5A) || match(is.peek(), 0x61, 0x7A)) {
//              解析一条规则
                Rule rule = rule();
//              判断该条规则是否已经有有定义
                if (null == ruleMap.get(rule.getRuleName())) {
//                  如果没有定义则放入规则列表
                    ruleMap.put(rule.getRuleName(), rule);
                    ruleList.add(rule);
                } else {
//                  已有定义,则检查定义方式是否为增量定义
                    Rule defined = ruleMap.get(rule.getRuleName());
                    if ("=".equals(rule.getDefinedAs()) && "=".equals(defined.getDefinedAs())) {
//                      如果不是增量定义,则抛出重复定义异常
                        throw new CollisionException(rule.getRuleName().toString() + " is redefined.", is.getPos(), is.getLine());
                    }
//                  如果是增量定义则合并两条规则
                    if ("=".equals(rule.getDefinedAs())) defined.setDefinedAs("=");
                    defined.getElements().getAlternation().getConcatenations().addAll(rule.getElements().getAlternation().getConcatenations());
                }
            } else {
//              空格、分号、回车,则是c_wsp
                while (match(is.peek(), 0x20) || match(is.peek(), ';') || match(is.peek(), 0x0D)) {
                    c_wsp();
                }
                c_nl();
            }
        }
        return ruleList;
	}


//		        rulename       =  ALPHA *(ALPHA / DIGIT / "-")
	protected RuleName rulename() throws IOException, MatchException {
//		 ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z
//	     DIGIT          =  %x30-39
//      规则名的第一个字符必须是字母
        if (!(match(is.peek(), 0x41, 0x5A) || match(is.peek(), 0x61, 0x7A))) {
            throw new MatchException("'A'-'Z'/'a'-'z'", is.peek(), is.getPos(), is.getLine());
        }
        String rulename = "";
        rulename += (char)is.read();
//      规则名的后续字符可以是字母、数字、破折号
        while (match(is.peek(), 0x41, 0x5A) || match(is.peek(), 0x61, 0x7A) || match(is.peek(), 0x30, 0x39) |match(is.peek(), '-')) {
            rulename += (char)is.read();
        }
        return new RuleName(prefix, rulename);
	}

//		        defined-as     =  *c-wsp ("=" / "=/") *c-wsp
	protected String defined_as() throws IOException, MatchException {
        String value = "";
//      等号前面的空格
        while (match(is.peek(), 0x20) || match(is.peek(), 0x09) || match(is.peek(), ';') || match(is.peek(), (char)0x0D)) {
            c_wsp();
        }
//      等号
        assertMatch(is.peek(), '=');
        value = String.valueOf((char)is.read());
//      是否增量定义
        if (match(is.peek(), '/')) {
            value += (char)is.read();
		}
//      等号后面的空格
        while (match(is.peek(), 0x20) || match(is.peek(), 0x09) || match(is.peek(), ';') || match(is.peek(), (char)0x0D)) {
            c_wsp();
        }
        return value;
	}

//		        elements       =  alternation *c-wsp
	protected Elements elements() throws IOException, MatchException {
//     
预测分析子程序是一种自顶向下的语法分析方法,它使用一个预测分析表来确定每个非终结符号的下一步操作。下面是一个简单的预测分析程序的实现,假设我们的文法为: ``` S -> aAB A -> bA | ε B -> cB | d ``` 其中,S、A 和 B 是非终结符号,a、b、c 和 d 是终结符号。 ```python class Parser: def __init__(self, grammar): self.grammar = grammar self.predictive_table = self.build_predictive_table() def build_predictive_table(self): table = {} for nonterminal in self.grammar.nonterminals: for terminal in self.grammar.terminals + ['$']: productions = self.grammar.get_productions(nonterminal, terminal) if len(productions) == 1: table[(nonterminal, terminal)] = productions[0] elif len(productions) > 1: raise ValueError('Grammar is not LL(1)') return table def parse(self, input): stack = ['$'] input.append('$') i = 0 while stack: symbol = stack.pop() if symbol in self.grammar.nonterminals: production = self.predictive_table[(symbol, input[i])] stack.extend(reversed(production.rhs)) elif symbol == input[i]: i += 1 else: raise ValueError('Unexpected token: {}'.format(input[i-1])) return True class Grammar: def __init__(self, productions): self.productions = productions self.nonterminals = set(p.lhs for p in productions) self.terminals = set(t for p in productions for t in p.rhs if t not in self.nonterminals) def get_productions(self, nonterminal, terminal): return [p for p in self.productions if p.lhs == nonterminal and (len(p.rhs) == 1 and p.rhs[0] == terminal or len(p.rhs) > 1 and p.rhs[0] in self.nonterminals and (nonterminal, terminal) in self.predictive_table[(p.rhs[0], terminal)])] class Production: def __init__(self, lhs, rhs): self.lhs = lhs self.rhs = rhs grammar = Grammar([ Production('S', ['a', 'A', 'B']), Production('A', ['b', 'A']), Production('A', []), Production('B', ['c', 'B']), Production('B', ['d']), ]) parser = Parser(grammar) input = ['a', 'b', 'c', 'd'] parser.parse(input) ``` 在预测分析程序中,我们首先构建了一个预测分析表,它是一个字典,键为一个非终结符号和一个终结符号的二元组,值为一个产生式。然后,我们使用一个栈来模拟语法分析过程。我们从栈中弹出一个符号,如果它是一个非终结符号,则查找预测分析表,获取对应的产生式,并将产生式的右部反转后入栈。如果它是一个终结符号,并且与输入符号相同,则继续处理下一个输入符号。如果它是一个终结符号,但与输入符号不同,则抛出异常。 在上面的代码中,我们还定义了一个 Grammar 类,它表示一个文法。该类包含一个产生式列表、非终结符号集合和终结符号集合。我们还定义了一个 Production 类,它表示一个产生式,包含一个左部和一个右部。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值