摘要
程序开发行业中有很多种编程语言,每个程序员大概也都会一两种,可你有没有想过自己DIY一种语言呢,本文就带你用.net DIY一种新语言--WawaSharp,我们将定义语法,实现词法分析,建立语法树,代码生成几个过程。
引言
创建 .NET Framework 语言编译器
http://msdn.microsoft.com/zh-cn/magazine/cc136756.aspx
本文也是根据这篇帖子来的,只是在其基础上支持了更多的语句和表达式,所以大家可以先看懂这篇文章,本文对这篇文章里讲到的东西也就不再详细重复。我们要提高一个程序灵活性的时候常常把一些变量做成配置,这时候需求变了的话,修改一下配置就可以满足需求了,可有时候配置不足以满足需求的变化,所以一些更NB的程序,可以提供一个SDK及一套自定义语言让使用者去二次开发,今天我们发明的语言就可以去当作这种场景下的自定义语言。另外一个目的就是和大家一起了解一下一个语言背后的故事,我们写的文本代码是如何变成可执行的程序的。
语法定义
< stmt > : = var < ident > = < expr >
| < ident > = < expr >
| for < ident > = < expr > to < expr > do < stmt > end
| foreach < ident > in < expr > do < stmt > end
| if < expr > then < stmt > end
| read_int < ident >
| print < expr >
| < stmt > ; < stmt >
| append < expr > < expr >
< expr > : = < string >
| < int >
| < arith_expr >
| < ident >
| match < expr > < expr >
| newsb
| len < expr >
< bin_expr > : = < expr > < bin_op > < expr >
< bin_op > : = + | - | * | / | == | eq
< ident > : = < char > < ident_rest >*
< ident_rest > : = < char > | < digit >
< int > : = < digit >+
< digit > : = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
< string > : = " <string_elem>* "
< string_elem > : = < any char other than " >
语法有了,我们看下我们要实现的语言大概是什么样子,如下
var input
=
""
11
|
222
|
33
|
44
|
55
""
;
var arr = match input "" /d +| "" ;
var sb = newsb;
foreach item in arr do
print item;
var l = len item;
if l eq 2 then
var arr_ = match item "" /d "" ;
foreach item_ in arr_ do
append sb "" /r/n "" ;
append sb item_;
end;
end;
end;
print sb;
var arr = match input "" /d +| "" ;
var sb = newsb;
foreach item in arr do
print item;
var l = len item;
if l eq 2 then
var arr_ = match item "" /d "" ;
foreach item_ in arr_ do
append sb "" /r/n "" ;
append sb item_;
end;
end;
end;
print sb;
综合语法定义和例子,我们可以看到,我们定义了string,bool,int,enumerable几种数据类型,var,foreach,if,print,append等几种语句,还有赋值,match,len,Equals,整形常量,字符串常量等几种表达式。
词法分析
把需求说清楚了,实现这个应该很简单吧,就写一个while循环,一个一个的读取字符,每满足一个token规则就放到list里面一个就行了,伪码如下
while (input.Peek() != - 1 )
{
char ch = ( char )input.Peek();
if ( char .IsWhiteSpace(ch))
{
// 忽略空白字符
input.Read();
}
else if ( char .IsLetter(ch) || ch == ' _ ' )
{
// 取出标识符,以字母和下划线组成
}
else if (ch == ' " ' )
{
// 取出字符串常量
}
else if ( char .IsDigit(ch))
{
// 取出数字常量
}
else switch (ch)
{
// 取出单字符操作符,如+,-,*,/等
}
}
到此,我们有了一个IList<object>,里面顺序放着我们分析出来的文本块儿。
语法分析
public
abstract
class
Stmt{}
public abstract class Expr {}
public class DeclareVar : Stmt
{
public Expr Expr;
public string Ident;
}
public class Assign : Stmt
{
public Expr Expr;
public string Ident;
}
public class Sequence : Stmt
{
public Stmt First;
public Stmt Second;
}
public class Foreach : Stmt
{
public Stmt Body;
public string Ident;
public Expr IEnumerable;
}
public class If : Stmt {
public Stmt Body;
public Expr Condition;
}
public class StrLen : Expr
{
public Expr Input;
}
public class Match : Expr
{
public Expr Input;
public Expr Pattern;
}
public abstract class Expr {}
public class DeclareVar : Stmt
{
public Expr Expr;
public string Ident;
}
public class Assign : Stmt
{
public Expr Expr;
public string Ident;
}
public class Sequence : Stmt
{
public Stmt First;
public Stmt Second;
}
public class Foreach : Stmt
{
public Stmt Body;
public string Ident;
public Expr IEnumerable;
}
public class If : Stmt {
public Stmt Body;
public Expr Condition;
}
public class StrLen : Expr
{
public Expr Input;
}
public class Match : Expr
{
public Expr Input;
public Expr Pattern;
}
意思就是我们要把token序列转换成一个由上述这些对象组成的一颗树,差不多是二叉树,这一步叫语法分析,据说要多复杂有多复杂,但咱们这里还算比较简单,也是一个while循环,比如碰到if语句,就根据语法定义if <expr> then <stmt> end,下一个token应该是个表达式,表示if语句的条件,那么就继续出一个表达式作为if语句的Condition树形,再往后要有一个then,再往后是一个语句,解析出一条完整语句,作为if的Body树形,完了最后以一个end结尾,大概伪码如下吧。
private
Stmt ParseStmt() {
if (tokens[index].Equals( " print " )) {
index ++ ;
var print = new Print();
print.Expr = ParseExpr();
result = print;
}
else if (tokens[index].Equals( " append " ))
{
index ++ ;
var append = new Append();
append.Buider = ParseExpr();
append.ToAppend = ParseExpr();
}
else if (tokens[index].Equals( " var " )) {}
else if (tokens[index].Equals( " foreach " )) {}
else if (tokens[index].Equals( " if " )) {}
else { thr ex;}
if (index < tokens.Count && tokens[index] == Scanner.Semi) {
index ++ ;
if (index < tokens.Count &&! tokens[index].Equals( " end " )) {
var sequence = new Sequence();
sequence.First = result;
sequence.Second = ParseStmt();
result = sequence;
}
}
}
private Expr ParseExpr() {
if (tokens[index] is StringBuilder) { // StringLiteral}
else if (tokens[index] is int ) { // IntLiteral}
else if (tokens[index] is string ) {
string value = ( string )tokens[index];
if (value.Equals( " match " )) { // Match}
else if (value.Equals( " newsb " )) { // Builder}
else if (value.Equals( " len " )) { // Strlen}
}
}
if (tokens[index].Equals( " print " )) {
index ++ ;
var print = new Print();
print.Expr = ParseExpr();
result = print;
}
else if (tokens[index].Equals( " append " ))
{
index ++ ;
var append = new Append();
append.Buider = ParseExpr();
append.ToAppend = ParseExpr();
}
else if (tokens[index].Equals( " var " )) {}
else if (tokens[index].Equals( " foreach " )) {}
else if (tokens[index].Equals( " if " )) {}
else { thr ex;}
if (index < tokens.Count && tokens[index] == Scanner.Semi) {
index ++ ;
if (index < tokens.Count &&! tokens[index].Equals( " end " )) {
var sequence = new Sequence();
sequence.First = result;
sequence.Second = ParseStmt();
result = sequence;
}
}
}
private Expr ParseExpr() {
if (tokens[index] is StringBuilder) { // StringLiteral}
else if (tokens[index] is int ) { // IntLiteral}
else if (tokens[index] is string ) {
string value = ( string )tokens[index];
if (value.Equals( " match " )) { // Match}
else if (value.Equals( " newsb " )) { // Builder}
else if (value.Equals( " len " )) { // Strlen}
}
}
最终,我们会形成一个语法树,比如我们示例代码的分析结果如下
[Sequence]
| First:
| | [DeclareVar]
| | | Ident:input
| | | Expr:
| | | |< StringLiteral > : " 11|222|33|44|55 "
| Second:
| | [Sequence]
| | | First:
| | | | [DeclareVar]
| | | | | Ident:arr
| | | | | Expr:
| | | | | |< Match > :
| | | | | | | Input:
| | | | | | | |< Variable > :input
| | | | | | | Pattern:
| | | | | | | |< StringLiteral > : " /d+| "
| | | Second:
| | | | [Sequence]
| | | | | First:
| | | | | | [DeclareVar]
| | | | | | | Ident:sb
| | | | | | | Expr:
| | | | | | | | [Builder]
| | | | | Second:
| | | | | | [Sequence]
| | | | | | | First:
| | | | | | | |< Foreach > :item
| | | | | | | | | IEnumerable:
| | | | | | | | | |< Variable > :arr
| | | | | | | | | Body:
| | | | | | | | | | [Sequence]
| | | | | | | | | | | First:
| | | | | | | | | | | | [Print]
| | | | | | | | | | | | | Expr:
| | | | | | | | | | | | | |< Variable > :item
| | | | | | | | | | | Second:
| | | | | | | | | | | | [Sequence]
| | | | | | | | | | | | | First:
| | | | | | | | | | | | | | [DeclareVar]
| | | | | | | | | | | | | | | Ident:l
| | | | | | | | | | | | | | | Expr:
| | | | | | | | | | | | | | | |< StrLen > :
| | | | | | | | | | | | | | | | | Input:
| | | | | | | | | | | | | | | | | |< Variable > :item
| | | | | | | | | | | | | Second:
| | | | | | | | | | | | | |< If > :
| | | | | | | | | | | | | | | Condition:
| | | | | | | | | | | | | | | |< BinExpr > :Eq
| | | | | | | | | | | | | | | | | left:
| | | | | | | | | | | | | | | | | |< Variable > :l
| | | | | | | | | | | | | | | | | Right:
| | | | | | | | | | | | | | | | | |< IntLiteral > : 2
| | | | | | | | | | | | | | | Body:
| | | | | | | | | | | | | | | | [Sequence]
| | | | | | | | | | | | | | | | | First:
| | | | | | | | | | | | | | | | | | [DeclareVar]
| | | | | | | | | | | | | | | | | | | Ident:arr_
| | | | | | | | | | | | | | | | | | | Expr:
| | | | | | | | | | | | | | | | | | | |< Match > :
| | | | | | | | | | | | | | | | | | | | | Input:
| | | | | | | | | | | | | | | | | | | | | |< Variable > :item
| | | | | | | | | | | | | | | | | | | | | Pattern:
| | | | | | | | | | | | | | | | | | | | | |< StringLiteral > : " /d "
| | | | | | | | | | | | | | | | | Second:
| | | | | | | | | | | | | | | | | |< Foreach > :item_
| | | | | | | | | | | | | | | | | | | IEnumerable:
| | | | | | | | | | | | | | | | | | | |< Variable > :arr_
| | | | | | | | | | | | | | | | | | | Body:
| | | | | | | | | | | | | | | | | | | | [Sequence]
| | | | | | | | | | | | | | | | | | | | | First:
| | | | | | | | | | | | | | | | | | | | | | [Append]:
| | | | | | | | | | | | | | | | | | | | | | | Buider:
| | | | | | | | | | | | | | | | | | | | | | | |< Variable > :sb
| | | | | | | | | | | | | | | | | | | | | | | ToAppend:
| | | | | | | | | | | | | | | | | | | | | | | |< StringLiteral > : ""
| | | | | | | | | | | | | | | | | | | | | Second:
| | | | | | | | | | | | | | | | | | | | | | [Append]:
| | | | | | | | | | | | | | | | | | | | | | | Buider:
| | | | | | | | | | | | | | | | | | | | | | | |< Variable > :sb
| | | | | | | | | | | | | | | | | | | | | | | ToAppend:
| | | | | | | | | | | | | | | | | | | | | | | |< Variable > :item_
| | | | | | | Second:
| | | | | | | | [Print]
| | | | | | | | | Expr:
| | | | | | | | | |< Variable > :sb
| First:
| | [DeclareVar]
| | | Ident:input
| | | Expr:
| | | |< StringLiteral > : " 11|222|33|44|55 "
| Second:
| | [Sequence]
| | | First:
| | | | [DeclareVar]
| | | | | Ident:arr
| | | | | Expr:
| | | | | |< Match > :
| | | | | | | Input:
| | | | | | | |< Variable > :input
| | | | | | | Pattern:
| | | | | | | |< StringLiteral > : " /d+| "
| | | Second:
| | | | [Sequence]
| | | | | First:
| | | | | | [DeclareVar]
| | | | | | | Ident:sb
| | | | | | | Expr:
| | | | | | | | [Builder]
| | | | | Second:
| | | | | | [Sequence]
| | | | | | | First:
| | | | | | | |< Foreach > :item
| | | | | | | | | IEnumerable:
| | | | | | | | | |< Variable > :arr
| | | | | | | | | Body:
| | | | | | | | | | [Sequence]
| | | | | | | | | | | First:
| | | | | | | | | | | | [Print]
| | | | | | | | | | | | | Expr:
| | | | | | | | | | | | | |< Variable > :item
| | | | | | | | | | | Second:
| | | | | | | | | | | | [Sequence]
| | | | | | | | | | | | | First:
| | | | | | | | | | | | | | [DeclareVar]
| | | | | | | | | | | | | | | Ident:l
| | | | | | | | | | | | | | | Expr:
| | | | | | | | | | | | | | | |< StrLen > :
| | | | | | | | | | | | | | | | | Input:
| | | | | | | | | | | | | | | | | |< Variable > :item
| | | | | | | | | | | | | Second:
| | | | | | | | | | | | | |< If > :
| | | | | | | | | | | | | | | Condition:
| | | | | | | | | | | | | | | |< BinExpr > :Eq
| | | | | | | | | | | | | | | | | left:
| | | | | | | | | | | | | | | | | |< Variable > :l
| | | | | | | | | | | | | | | | | Right:
| | | | | | | | | | | | | | | | | |< IntLiteral > : 2
| | | | | | | | | | | | | | | Body:
| | | | | | | | | | | | | | | | [Sequence]
| | | | | | | | | | | | | | | | | First:
| | | | | | | | | | | | | | | | | | [DeclareVar]
| | | | | | | | | | | | | | | | | | | Ident:arr_
| | | | | | | | | | | | | | | | | | | Expr:
| | | | | | | | | | | | | | | | | | | |< Match > :
| | | | | | | | | | | | | | | | | | | | | Input:
| | | | | | | | | | | | | | | | | | | | | |< Variable > :item
| | | | | | | | | | | | | | | | | | | | | Pattern:
| | | | | | | | | | | | | | | | | | | | | |< StringLiteral > : " /d "
| | | | | | | | | | | | | | | | | Second:
| | | | | | | | | | | | | | | | | |< Foreach > :item_
| | | | | | | | | | | | | | | | | | | IEnumerable:
| | | | | | | | | | | | | | | | | | | |< Variable > :arr_
| | | | | | | | | | | | | | | | | | | Body:
| | | | | | | | | | | | | | | | | | | | [Sequence]
| | | | | | | | | | | | | | | | | | | | | First:
| | | | | | | | | | | | | | | | | | | | | | [Append]:
| | | | | | | | | | | | | | | | | | | | | | | Buider:
| | | | | | | | | | | | | | | | | | | | | | | |< Variable > :sb
| | | | | | | | | | | | | | | | | | | | | | | ToAppend:
| | | | | | | | | | | | | | | | | | | | | | | |< StringLiteral > : ""
| | | | | | | | | | | | | | | | | | | | | Second:
| | | | | | | | | | | | | | | | | | | | | | [Append]:
| | | | | | | | | | | | | | | | | | | | | | | Buider:
| | | | | | | | | | | | | | | | | | | | | | | |< Variable > :sb
| | | | | | | | | | | | | | | | | | | | | | | ToAppend:
| | | | | | | | | | | | | | | | | | | | | | | |< Variable > :item_
| | | | | | | Second:
| | | | | | | | [Print]
| | | | | | | | | Expr:
| | | | | | | | | |< Variable > :sb
我在Stmt和Expr类各定义了个ToString方法来用文本显示这棵树,最终结果类似上面这样的显示,虽然不是很美,但意思能表达出来,我们的分析结果确实是棵树,CodeDom有树,表达式树有树,咱们的WawaSharp也得有树。
看到这里,大家可以稍微休息休息,目前为止,我们还没碰到什么新鲜的,就是一些while语句和一些字符串拆分逻辑,下一篇会讲到代码生成,就是根据这棵树生成可执行的IL代码,有趣的是生成IL代码后还能用Reflector反编译成c#和vb代码,呵呵。