语义分析
进行语法分析后正确的源代码只能证明是语法方面正确的, 但是语言还有许多语义上的要求和规矩. 语义分析就是基于整个源代码的上下文的分析. <<编译原理与实践>>中对TINY语言的语义规定很简单. 因为TINY中没有声明, 也没有作用域的规定, 表达式的类型只有两种整型和布尔型, 所以整个TINY的语义分析主要集中在对表达式的类型检查上. 而这个检查也是十分简单的, 只有四条规则:
if 后面的表达式要求为布尔型
until 后面的表达式要求为布尔型
write 中的表达式要求为整型
赋值语句中的表达式要求为整型
一 语义分析程序在编译器中的位置
不同的编译器中语义分析程序的实现不同, 语义分析程序可以和语法分析程序整合到一起实现一遍编译, 也可以在语法分析程序生成语法树后对语法树进行单独的处理如图:
二 语法直到定义和属性.
语义分析的根本是对语言的各个元素的属性的合理性的分析. 这个属性就是语言规定的语义如TINY中if后面的表达式一定要求是布尔类型等. 有时候语言符合语法的规定但是不符合语义的规定如TINY中:
if x + 2 then
…
end
这里x + 2是符合语法的规定的但是它的类型是整型不符合上面提到的语义的规定, 语义分析就是要对这种情况进行分析.
语义是使用属性来描述的, 语言的元素都是在语法中描述的, 所以这个语义又与语法相关, 是根据语法来制定的. 如TINY中描述上面语义的属性定义:
9)IF-STMT-> if EXP then STMT-SEQUENCE end
{
if EXP.type <> bool or STMT-SEQUENCE.type = error then
IF-STMT.type = error
else
IF-STMT.type = void
end
}
10)IF-STMT-> if EXP then STMT-SEQUENCT else STMT-SEQUENCE end
{
if EXP.type <> bool or STMT-SEQUENCE1.type = error or STMT-SEQUENCE2.type = error then
IF-STMT.type = error
else
IF-STMT.type = void
end
}
这里{}中的部分就是语义的描述(其中使用的如if <> 等伪代码的描述). 这里只规定的TINY的语法元素只有一种名为type的属性, 它表示这个表达式的属性或者是integer或者是void或者是bool或者是error.
上面的描述都说明了当if语句中的EXP不是bool类型或者if语句其中的语句序列的属性错误(属性为error)时if语句的属性也是错误的.要不然if语句的属性就是空(void). 这个描述中if的语句都是依赖于它的子表达式EXP和STMT-SEQUENCE的属性, 这种属性是综合属性. 还有一种属性称为继承属性, 当一个产生式中的子表达式中一个属性依赖于这个产生式的属性或者是依赖于其前面的表达式的属性这要的属性是继承属性. 如假设
STMT-SEQUENCE中有一个属性x依赖于IF-STMT某种属性a或者是EXP的某种属性b
如:
9)IF-STMT-> if EXP then STMT-SEQUENCE end
{
STMT-SEQUENCE.x = attr0(IF-STMT.a)
或
STMT-SEQUENCE.x = attr1(EXP.b)
}
求出这个属性前必须已知IF-STMT.a或者EXP.b.
三 语义分析
语法分析分为自顶向下和自底向上的分析, 语法元素的属性也分为继承属性和综合属性, 在自顶向下的语法分析中求出继承属性是十分容易的, 但是求出综合属性很困难. 在自底向上的语法分析中求出综合属性很容易但是求出继承属性却很困难. 龙书中主要讲的就是在语法分析的同时进行语义分析的方法. 包括在自顶向下语法分析中求出继承和综合属性和在自底向上语法分析中求出继承和综合属性的算法.
还有的时候继承属性和综合属性之间有依赖关系, 这种情况下是不能通过一遍编译求出所有属性的. 必须通过对语法树的多遍扫描来求出所有的属性(这里有一个属性求解的拓扑排序).
将语法分析与语义分析分开进行的模型如上图中的第二幅图, 语义分析程序以语法分析生成的语法树为输入通过对语法树的遍历来求出继承属性和综合属性. 在就继承属性时要对树进行前序遍历, 在求综合属性时要对树进行后序遍历.
四 代码
TINY的语义及其简单:
if 后面的表达式要求为布尔型
until 后面的表达式要求为布尔型
write 中的表达式要求为整型
赋值语句中的表达式要求为整型
按照上面的规则定义属性:
0)PROGRAM'-> PROGRAM
{PROGRAM'.type = PROGRAM.tpye}
1)PROGRAM-> STMT-SEQUENCE
{PROGRAM.type = STMT-SEQUENCE.tpye}
2)STMT-SEQUENCE-> STMT-SEQUENCE ; STATEMENT
{
if STMT-SEQUENCE1.type = error or STATEMENT.type = error then
STMT-SEQUENCE0.type = error
else
STMT-SEQUENC0.type = void
end
}
3)STMT-SEQUENCE-> STATEMENT
{STMT-SEQUENCE.type = STATEMENT.type}
4)STATEMENT-> IF-STMT
{STATEMENT.type = IF-STMT.type}
5)STATEMENT-> REPEAT-STMT
{STATEMENT.type = REPEAT-STMT.type}
6)STATEMENT-> ASSIGN-STMT
{STATEMENT.type = ASSIGN-STMT}
7)STATEMENT-> READ-STMT
{STATEMENT.type = READ-STMT.type}
8)STATEMENT-> WRITE-STMT
{STATEMENT.type = WRITE-STMT.type}
9)IF-STMT-> if EXP then STMT-SEQUENCE end
{
if EXP.type <> bool or STMT-SEQUENCE.type = error then
IF-STMT.type = error
else
IF-STMT.type = void
end
}
10)IF-STMT-> if EXP then STMT-SEQUENCT else STMT-SEQUENCE end
{
if EXP.type <> bool or STMT-SEQUENCE1.type = error or STMT-SEQUENCE2.type = error then
IF-STMT.type = error
else
IF-STMT.type = void
end
}
11)REPEAT-STMT-> repeat STMT-SEQUENCE until EXP
{
if STMT-SEQUENCE.type = error or EXP.type <> bool then
REPEAT-STMT.type = error
else
REPEAT-STMT.type = void
end
}
12)ASSIGN-STMT-> identifier := EXP
{
if EXP.type = bool then
ASSIGN-STMT.type = error
else
ASSIGN-STMT.type = void
end
}
13)READ-STMT-> read identifier
{READ-STMT.type = void}
14)WRITE-STMT-> write EXP
{
if EXP.type = bool then
WRITE-STMT.type = error
else
WRITE-STMT.type = void
end
}
15)EXP-> SIMPLE-EXP COMPARISON-OP SIMPLE-EXP
{
if SIMPLE-EXP1.type = error or COMPARISON-OP.type = error or SIMPLE-EXP.type = error then
EXP.type = error
else
EXP.type = bool
end
}
16)EXP-> SIMPLE-EXP
{EXP.type = SIMPLE-EXP.type}
17)COMPARISON-OP-> <
{COMPARISON-OP.type = bool}
18)COMPARISON-OP-> =
{COMPARISON-OP.type = bool}
19)SIMPLE-EXP-> SIMPLE-EXP ADDOP TERM
{
if SIMPALE-EXP1.type = error or ADDOP.type = error or TERM.type = error then
SIMPLE-EXP0.type = error
else
SIMPLE-EXP0.type = integer
end
}
20)SIMPLE-EXP-> TERM
{SIMPLE-EXP.type = TERM.type}
21)ADDOP-> +
{ADDOP.type = integer}
22)ADDOP-> -
{ADDOP.type = integer}
23)TERM-> TERM MULOP FACTOR
{
if TERM1.type = error or MULOP.type = error or FACTOR.type = error then
TERM0.type = error
else
TERM0.type = integer
}
24)TERM-> FACTOR
{TERM.type = FACTOR.type}
25)MULOP-> *
{MULOP.type = integer}
26)MULOP-> /
{MULOP.type = integer}
27)FACTOR-> (EXP)
{FACTOR.type = EXP.type}
28)FACTOR-> number
{FACTOR.type = integer}
29)FACTOR-> identifier
{FACTOR.type = integer}
TINY的语义属性中只有一个type属性, 它是综合属性, 采用一遍编译的方法将语义分析程序加入到前面的SLR语法分析程序中:
TreeNode* parse_slr(void)
{
int count = 0;
int i = 0;
TreeNode* tree = NULL;
TreeNode* tree_temp = NULL;
TokenType bnf;
TokenType status_bnf;
TokenType token;
TokenType oper;
TokenType nes;
init_slr_stack();
init_tree_node_stack();
token = get_token();
oper = action(stack_top(), token, &status_bnf);
while(oper != ACC)
{
#ifdef TRACE_SLR_STACK
trace_stack();
#endif
switch(oper)
{
case MOVE:
push(token);
push(status_bnf);
/* create id, num, comparison-op, mulop, addop node */
switch(token)
{
case ID:
tree = new_exp_node(KIND_ID);
tree->attr.name = copy_string(token_string);
tree->type = KIND_INT;
child_stack_push(tree);
break;
case NUM:
tree = new_exp_node(KIND_CONST);
tree->attr.val = atoi(token_string);
tree->type = KIND_INT;
child_stack_push(tree);
break;
case LT:
case EQ:
tree = new_exp_node(KIND_OP);
tree->attr.op = token;
tree->type = KIND_BOOL;
child_stack_push(tree);
break;
case PLUS:
case MINUS:
case MULT:
case DIV:
tree = new_exp_node(KIND_OP);
tree->attr.op = token;
tree->type = KIND_INT;
child_stack_push(tree);
break;
default:
break;
}
token = get_token();
break;
case MERGER:
bnf = status_bnf;
nes = merger(status_bnf, &count);
for(i = 0; i < 2*count; i++)
pop();
status_bnf = go_to(stack_top(), nes);
push(nes);
push(status_bnf);
/* create stmt tree node */
switch(bnf)
{
/* TERM-> TERM MULOP FACTOR */
case BNF23:
/* SIMPLE-EXP-> SIMPLE-EXP ADDOP TERM */
case BNF19:
/* EXP-> SIMPLE-EXP COMPARISON-OP SIMPLE-EXP */
case BNF15:
tree_temp = child_stack_pop();
tree = child_stack_pop();
tree->child[1] = tree_temp;
tree->child[0] = child_stack_pop();
if((tree->child[0]->type == KIND_ERROR) ||
(tree->child[1]->type == KIND_ERROR))
tree->type = KIND_ERROR;
child_stack_push(tree);
break;
/* WRITE-STMT-> write EXP */
case BNF14:
tree = new_stmt_node(KIND_WRITE);
tree->child[0] = child_stack_pop();
if(tree->child[0]->type == KIND_INT)
tree->type = KIND_VOID;
else
{
tree->type = KIND_ERROR;
slr_error(E5003, token);
}
child_stack_push(tree);
break;
/* READ-STMT-> read identifier */
case BNF13:
tree = new_stmt_node(KIND_READ);
tree_temp = child_stack_pop();
tree->attr.name = copy_string(tree_temp->attr.name);
tree->type = KIND_VOID;
destroy_tree_node(tree_temp);
child_stack_push(tree);
break;
/* ASSIGN-STMT-> identifier := EXP */
case BNF12:
tree = new_stmt_node(KIND_ASSIGN);
tree->child[0] = child_stack_pop();
tree_temp = child_stack_pop();
tree->attr.name = copy_string(tree_temp->attr.name);
if(tree->child[0]->type == KIND_INT)
tree->type = KIND_VOID;
else
{
tree->type = KIND_ERROR;
slr_error(E5004, token);
}
destroy_tree_node(tree_temp);
child_stack_push(tree);
break;
/* REPEAT-STMT-> repeat STMT-SEQUENCE until EXP */
case BNF11:
tree = new_stmt_node(KIND_REPEAT);
tree->child[1] = child_stack_pop();
tree->child[0] = child_stack_pop();
if(tree->child[1]->type != KIND_BOOL)
{
tree->type = KIND_ERROR;
slr_error(E5002, token);
}
else
tree->type = KIND_VOID;
child_stack_push(tree);
break;
/* IF-STMT-> if EXP then STMT-SEQUENCE else STMT-SEQUENCE end */
case BNF10:
tree = new_stmt_node(KIND_IF);
tree->child[2] = child_stack_pop();
tree->child[1] = child_stack_pop();
tree->child[0] = child_stack_pop();
if(tree->child[0]->type != KIND_BOOL)
{
tree->type = KIND_ERROR;
slr_error(E5001, token);
}
else
tree->type = KIND_VOID;
child_stack_push(tree);
break;
/* IF-STMT-> if EXP then STMT-SEQUENCE end */
case BNF9:
tree = new_stmt_node(KIND_IF);
tree->child[1] = child_stack_pop();
tree->child[0] = child_stack_pop();
if(tree->child[0]->type != KIND_BOOL)
{
tree->type = KIND_ERROR;
slr_error(E5001, token);
}
else
tree->type = KIND_VOID;
child_stack_push(tree);
break;
/* STMT-SEQUENCE-> STMT-SEQUENCE ; STATEMENT */
case BNF2:
tree_temp = child_stack_pop();
tree = child_stack_top();
while(tree->sibling != NULL)
tree = tree->sibling;
tree->sibling = tree_temp;
tree->type = KIND_VOID;
break;
default:
break;
}
break;
default:
token = slr_error(oper, token);
break;
}
oper = action(stack_top(), token, &status_bnf);
}
tree = child_stack_pop();
return tree;
}
其中对type属性的操作就是对属性的计算, 这里又加入了一些新的错误类型和新的错误处理程序:
case E5001:
fprintf(stderr, "semantic error(E5001) at line %d: "
"expression type after keyword /"if/" must be boolean./n",
line_no);
break;
case E5002:
fprintf(stderr, "semantic error(E5002) at line %d: "
"expression type after keyword /"until/" must be boolean./n",
line_no);
break;
case E5003:
fprintf(stderr, "semantic error(E5003) at line %d: "
"expression type after keyword /"write/" must be integer./n",
line_no);
break;
case E5004:
fprintf(stderr, "semantic error(E5004) at line %d: "
"expression type in assign statement must be integer./n",
line_no);
这是新加入到slr_error中的代码, 只是输出错误信息, 并不进行任何错误处理.