对于LR分析而言,规约状态可以从这个角度分为两类,一类是“纯”的规约状态,也就是分析预测表中某个状态Action表一行全部是规约行为,如Jerry语言的状态WhileLoop_5,这个状态无论遇到什么符号都会规约一个while循环到一个语句;另外一种就是有移进-规约冲突的状态。从实现的角度来看,这两种状态的实现可以完全不同:对于前一种情况,一旦步入该状态,就规约,然后根据规约后得到的非终结符Goto;而后一种情况则要等到下一符号输入之后再确认应规约(注:这里的实现并没有符号类型判断这一过程,而是根据符号类型查找分析预测表获取Action函数入口地址,因此在规约函数内部会直接进行规约)。
然而,有必要弄得不同吗?
其实前一种LR(0)式的规约行为可以看作一种特殊情况,它仍需要等到下一终结符输入,但是无论下一终结符为什么,它都执行规约,然后Goto,最后再来处理刚刚读入的终结符。将两种情况合并为一种情况,这样可以简化思路,使实现更加容易。
规约函数内的步骤是这样的:
1 - 从符号栈中弹出必要的符号,并根据这些符号产生规约后得到的非终结符
2 - 从状态栈中弹出一定数目的状态
3 - 根据此时状态栈栈顶的状态及刚刚规约得到的非终结符类型确定Goto的状态,并将该状态压状态栈
4 - 将终结符传给consumeToken方法继续分析
比如刚才提到的状态WhileLoop_5,它的规约行为可以这样实现:
void reduce_func(struct LRAnalyser* self, struct Token* t) { struct AbstractSyntaxNode* loop = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractValueNode* condition = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newWhileNode(condition, loop)); analyser->stateStack->pops(analyser->stateStack, 5, NULL); struct LRState* target = ((struct LRState*)self->stateStack-> peek(self->stateStack))->gotoes[Sentence]; self->stateStack->push(self->stateStack, target); self->consumeToken(self, t); }
然后你可以仿着这个函数,为每个产生式实现一个规约行为。
不过,很快一个郁闷的事情就摆在面前,对于每一个这样的函数,除了第一部分取符号和产生新符号在每个函数中不同以外,接下来的两个部分仅仅参数不同,而最后调用consumeToken则完全一样,于是“复制与粘贴”设计模式开始大行其道。这显然是一件不愉快的事情:其实跟shift的实现一样,这里可以用宏来生成规约函数。当然,因为步骤1的差异实在是太大了,所以可以考虑把这一部分单独拿出来用函数来实现,如
void linkName(Sentence, WhileLoop)(struct LRAnalyser* analyser) { struct AbstractSyntaxNode* loop = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractValueNode* condition = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newWhileNode(condition, loop)); }
注:这里使用了linkName宏来产生函数名。不过,一个规约函数单单只是把符号规约了,这看起来并不完整,LR分析算法中的规约行为应该是同时弹出状态栈和符号栈对应的元素,规约得到新的非终结符和新状态再同时压回栈内。从目前计算机体系结构上来说“同时”是不太可能的,不过如果把弹出状态栈的工作也应该放在产生式函数内会更让它显得完整。因为弹符号栈也是很单调的工作,因此可以定义一个宏来简化之:
#define popStateStack(stack, nr) (stack)->pops(stack, nr, NULL) // 现在的产生式函数 void linkName(Sentence, WhileLoop)(struct LRAnalyser* analyser) { struct AbstractSyntaxNode* loop = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractValueNode* condition = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newWhileNode(condition, loop)); popStateStack(analyser->stateStack, 5); }
将产生式这个部分单独提出来以后,下面就可以给规约函数建立一个宏模板了:
#define reduce(current, encounter, leftPart, rightPart) \ static void linkName(current, encounter)(struct LRAnalyser* self, \ struct Token* t) \ { \ linkName(leftPart, rightPart)(self); \ struct LRState* target = ((struct LRState*)self->stateStack-> \ peek(self->stateStack))->gotoes[leftPart]; \ self->stateStack->push(self->stateStack, target); \ self->consumeToken(self, t); \ } /**************************************************************************/
这个宏用到4个参数,其中跟shift一样,前两个参数并不在函数内出现,而仅用来构造函数名。参数leftPart和rightPart首先会在调用产生式函数时用来拼函数名,这就是之前产生式函数要用宏来拼函数名的原因。接下来,leftPart又在获取Goto状态处作为下标用到了,所以leftPart一定要是某个LRNonTerminal枚举(同样地,构造产生式函数时,使用linkName宏的第一参数也必须是),而rightPart则可以随意一些,当然要能认出来。
对于那些遇到任意符号都规约的状态,可以设计一个宏去囊括所有情况:
#define reduceAny(current, leftPart, rightPart) \ reduce(current, END, leftPart, rightPart) \ reduce(current, IDENT, leftPart, rightPart) \ reduce(current, ELSE, leftPart, rightPart) \ reduce(current, IF, leftPart, rightPart) \ reduce(current, WHILE, leftPart, rightPart) \ reduce(current, READ, leftPart, rightPart) \ reduce(current, WRITE, leftPart, rightPart) \ reduce(current, BREAK, leftPart, rightPart) \ reduce(current, INTEGER_TYPE, leftPart, rightPart) \ reduce(current, REAL_TYPE, leftPart, rightPart) \ reduce(current, INTEGER, leftPart, rightPart) \ reduce(current, REAL, leftPart, rightPart) \ reduce(current, PLUS, leftPart, rightPart) \ reduce(current, MINUS, leftPart, rightPart) \ reduce(current, MULTIPLY, leftPart, rightPart) \ reduce(current, DIVIDE, leftPart, rightPart) \ reduce(current, ASSIGN, leftPart, rightPart) \ reduce(current, LT, leftPart, rightPart) \ reduce(current, LE, leftPart, rightPart) \ reduce(current, EQ, leftPart, rightPart) \ reduce(current, GT, leftPart, rightPart) \ reduce(current, GE, leftPart, rightPart) \ reduce(current, NE, leftPart, rightPart) \ reduce(current, AND, leftPart, rightPart) \ reduce(current, OR, leftPart, rightPart) \ reduce(current, NOT, leftPart, rightPart) \ reduce(current, COMMA, leftPart, rightPart) \ reduce(current, EOS, leftPart, rightPart) \ reduce(current, LPARENT, leftPart, rightPart) \ reduce(current, RPARENT, leftPart, rightPart) \ reduce(current, LBRACKET, leftPart, rightPart) \ reduce(current, RBRACKET, leftPart, rightPart) \ reduce(current, LBRACE, leftPart, rightPart) \ reduce(current, RBRACE, leftPart, rightPart)
至于将这些函数与表中对应的Action绑定,则跟shift一样使用setAction宏就可以了。
特别情况
还记得在分析LR状态集时谈到的一个很纠结的状态DeclarationVariableRegister吗?在shift到状态Declaration_1时,策略不是简单的改变状态,而是将符号的类型作为参数构造一个DeclarationNode压入符号栈中。每次进入状态DeclarationVariableRegister时表明一个VariableRegister已经准备就绪,所以在这个状态下的规约应该是这样的:
static void linkName(VariableRegister, VariableInitialization) (struct LRAnalyser* analyser) { void* initialValue = analyser->symbolStack->pop(analyser->symbolStack); void* variable = analyser->symbolStack->pop(analyser->symbolStack); struct DeclarationNode* decl = (struct DeclarationNode*) (analyser->symbolStack-> peek(analyser->symbolStack)); decl->vars->enqueue(decl->vars, variable); decl->initVals->enqueue(decl->initVals, initialValue); popStateStack(analyser->stateStack, 2); }
它并不建立新的非终结符塞入符号栈中(类似的还有static void linkName(Sentence, IOAssignmentEoS)(struct LRAnalyser* analyser),请看最后给出的代码)。
接下来状态DeclarationVariableRegister遇到<EOS>和<COMMA>都会移进,遇到<EOS>移进到状态Declaration_3的代码上一节中给出了(一个短短的宏),而遇<COMMA>的移进并没有给出。其实,这个实现很特别,它看起来并不是一个移进
static void linkName(DeclarationVariableRegister, COMMA) (struct LRAnalyser* self, struct Token* t) { popStateStack(self->stateStack, 1); }
从栈中弹出最近的状态,就这样结束了。弹出了以后,栈顶状态就回复到状态Declaration_1,于是又可以接受VariableRegister了。
实际上所有左递归式的产生时候都可以这样解决,而不必拘泥于去把每个LR状态都列举出来逐个分析。
附:规约相关的代码
产生式函数中linkName第二参数为Null的表示这个产生式左部推出空串。
// 产生式函数代码 #define popStateStack(stack, nr) (stack)->pops(stack, nr, NULL) static void linkName(BasicBlock, Null)(struct LRAnalyser* analyser) { analyser->symbolStack->push(analyser->symbolStack, NULL); } static void linkName(BasicBlock, SentenceBasicBlock) (struct LRAnalyser* analyser) { struct AbstractSyntaxNode* block = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractSyntaxNode* sentence = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); if(NULL != sentence) { sentence->nextNode = block; block = sentence; } analyser->symbolStack->push(analyser->symbolStack, block); popStateStack(analyser->stateStack, 2); } static void linkName(Sentence, EoS)(struct LRAnalyser* analyser) { analyser->symbolStack->push(analyser->symbolStack, NULL); popStateStack(analyser->stateStack, 1); } static void linkName(Sentence, IfElseBranch)(struct LRAnalyser* analyser) { struct AbstractSyntaxNode* invalidSuit = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractSyntaxNode* validSuit = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractValueNode* condition = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newIfElseNode(condition, validSuit, invalidSuit)); popStateStack(analyser->stateStack, 6); } static void linkName(Sentence, WhileLoop)(struct LRAnalyser* analyser) { struct AbstractSyntaxNode* loop = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct AbstractValueNode* condition = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newWhileNode(condition, loop)); popStateStack(analyser->stateStack, 5); } static void linkName(Sentence, TypeVariableRegisterEoS) (struct LRAnalyser* analyser) { popStateStack(analyser->stateStack, 3); } // 这条产生式也是将栈顶符号取出然后送往次栈顶的IONode,而不产生新符号 static void linkName(Sentence, IOAssignmentEoS)(struct LRAnalyser* analyser) { struct AbstractValueNode* expression = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); struct IONode* node = (struct IONode*)(analyser->symbolStack-> peek(analyser->symbolStack)); node->expression = expression; popStateStack(analyser->stateStack, 3); } static void linkName(Sentence, AssignmentEoS)(struct LRAnalyser* analyser) { struct AbstractValueNode* arith = (struct AbstractValueNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newArithmaticNode(arith)); popStateStack(analyser->stateStack, 2); } static void linkName(Sentence, BreakEoS)(struct LRAnalyser* analyser) { analyser->symbolStack->push(analyser->symbolStack, newBreakNode()); popStateStack(analyser->stateStack, 2); } static void linkName(ElseBlock, Null)(struct LRAnalyser* analyser) { analyser->symbolStack->push(analyser->symbolStack, NULL); } static void linkName(ElseBlock, ElseBasicBlock)(struct LRAnalyser* analyser) { popStateStack(analyser->stateStack, 2); } static void linkName(Initialization, Null)(struct LRAnalyser* analyser) { analyser->symbolStack->push(analyser->symbolStack, NULL); } static void linkName(Sentence, LBraceBasicBlockRBrace) (struct LRAnalyser* analyser) { struct AbstractSyntaxNode* innerBlock = (struct AbstractSyntaxNode*) (analyser->symbolStack->pop(analyser->symbolStack)); analyser->symbolStack->push(analyser->symbolStack, newBasicBlockNode(innerBlock)); popStateStack(analyser->stateStack, 3); } static void linkName(Initialization, AssignAssignment) (struct LRAnalyser* analyser) { popStateStack(analyser->stateStack, 2); } static void linkName(VariableRegister, VariableInitialization) (struct LRAnalyser* analyser) { void* initialValue = analyser->symbolStack->pop(analyser->symbolStack); void* variable = analyser->symbolStack->pop(analyser->symbolStack); struct DeclarationNode* decl = (struct DeclarationNode*) (analyser->symbolStack-> peek(analyser->symbolStack)); decl->vars->enqueue(decl->vars, variable); decl->initVals->enqueue(decl->initVals, initialValue); popStateStack(analyser->stateStack, 2); } // 规约Action #define reduce(current, encounter, leftPart, rightPart) \ static void linkName(current, encounter)(struct LRAnalyser* self, \ struct Token* t) \ { \ printf( #current " reduce " #rightPart " to " #leftPart "\n"); \ linkName(leftPart, rightPart)(self); \ struct LRState* target = ((struct LRState*)self->stateStack-> \ peek(self->stateStack))->gotoes[leftPart]; \ printf(" Goto: %d / %p\n", target - jerryLRStates, target); \ self->stateStack->push(self->stateStack, target); \ self->consumeToken(self, t); \ } /************************************************************************/ reduce(LRState0, END, BasicBlock, Null) reduce(LRState0, RBRACE, BasicBlock, Null) reduce(LRState0, ELSE, BasicBlock, Null) reduce(BasicBlock_SentenceBasicBlock_1, END, BasicBlock, Null) reduce(BasicBlock_SentenceBasicBlock_1, RBRACE, BasicBlock, Null) reduce(BasicBlock_SentenceBasicBlock_1, ELSE, BasicBlock, Null) reduce(VariableRegister_VariableInitialization_1, EOS, Initialization, Null) reduce(VariableRegister_VariableInitialization_1, COMMA, Initialization, Null) reduce(IfElseBranch_5, RBRACE, ElseBlock, Null) reduce(IfElseBranch_5, END, ElseBlock, Null) #define reduceFirstSentence(current, leftPart, rightPart) \ reduce(current, IF, leftPart, rightPart) \ reduce(current, WHILE, leftPart, rightPart) \ reduce(current, READ, leftPart, rightPart) \ reduce(current, WRITE, leftPart, rightPart) \ reduce(current, BREAK, leftPart, rightPart) \ reduce(current, INTEGER_TYPE, leftPart, rightPart) \ reduce(current, REAL_TYPE, leftPart, rightPart) \ reduce(current, INTEGER, leftPart, rightPart) \ reduce(current, REAL, leftPart, rightPart) \ reduce(current, NOT, leftPart, rightPart) \ reduce(current, PLUS, leftPart, rightPart) \ reduce(current, MINUS, leftPart, rightPart) \ reduce(current, IDENT, leftPart, rightPart) \ reduce(current, EOS, leftPart, rightPart) \ reduce(current, LBRACE, leftPart, rightPart) \ reduce(current, LPARENT, leftPart, rightPart) /**/ reduceFirstSentence(IfElseBranch_5, ElseBlock, Null) #define reduceAny(current, leftPart, rightPart) \ reduce(current, END, leftPart, rightPart) \ reduce(current, IDENT, leftPart, rightPart) \ reduce(current, ELSE, leftPart, rightPart) \ reduce(current, IF, leftPart, rightPart) \ reduce(current, WHILE, leftPart, rightPart) \ reduce(current, READ, leftPart, rightPart) \ reduce(current, WRITE, leftPart, rightPart) \ reduce(current, BREAK, leftPart, rightPart) \ reduce(current, INTEGER_TYPE, leftPart, rightPart) \ reduce(current, REAL_TYPE, leftPart, rightPart) \ reduce(current, INTEGER, leftPart, rightPart) \ reduce(current, REAL, leftPart, rightPart) \ reduce(current, PLUS, leftPart, rightPart) \ reduce(current, MINUS, leftPart, rightPart) \ reduce(current, MULTIPLY, leftPart, rightPart) \ reduce(current, DIVIDE, leftPart, rightPart) \ reduce(current, ASSIGN, leftPart, rightPart) \ reduce(current, LT, leftPart, rightPart) \ reduce(current, LE, leftPart, rightPart) \ reduce(current, EQ, leftPart, rightPart) \ reduce(current, GT, leftPart, rightPart) \ reduce(current, GE, leftPart, rightPart) \ reduce(current, NE, leftPart, rightPart) \ reduce(current, AND, leftPart, rightPart) \ reduce(current, OR, leftPart, rightPart) \ reduce(current, NOT, leftPart, rightPart) \ reduce(current, COMMA, leftPart, rightPart) \ reduce(current, EOS, leftPart, rightPart) \ reduce(current, LPARENT, leftPart, rightPart) \ reduce(current, RPARENT, leftPart, rightPart) \ reduce(current, LBRACKET, leftPart, rightPart) \ reduce(current, RBRACKET, leftPart, rightPart) \ reduce(current, LBRACE, leftPart, rightPart) \ reduce(current, RBRACE, leftPart, rightPart) reduceAny(Sentence_EoS_1, Sentence, EoS) reduceAny(BasicBlock_SentenceBasicBlock_2, BasicBlock, SentenceBasicBlock) reduceAny(Sentence_IOAssignmentEoS_3, Sentence, IOAssignmentEoS) reduceAny(IfElseBranch_6, Sentence, IfElseBranch) reduceAny(ElseBlock_2, ElseBlock, ElseBasicBlock) reduceAny(WhileLoop_5, Sentence, WhileLoop) reduceAny(Sentence_AssignmentEoS_2, Sentence, AssignmentEoS) reduceAny(Sentence_BreakEoS_2, Sentence, BreakEoS) reduceAny(Declaration_3, Sentence, TypeVariableRegisterEoS) reduceAny(Sentence_LBraceBasicBlockRBrace_3, Sentence, LBraceBasicBlockRBrace) reduceAny(Initialization_AssignAssignment_2, Initialization, AssignAssignment) reduceAny(VariableRegister_VariableInitialization_2, VariableRegister, VariableInitialization) #undef reduceAny #undef reduce