javaCC入门教程-1、匹配括号
1、配置javacc环境变量
将javacc的路径添加到系统变量Path,D:\java源码包\javacc\javacc-5.0\bin
测试javacc命令
cmd模式下j输入javacc测试
Simple1.jj 文件内容如下:
options {
LOOKAHEAD = 1;
CHOICE_AMBIGUITY_CHECK = 2;
OTHER_AMBIGUITY_CHECK = 1;
STATIC = true;
DEBUG_PARSER = false;
DEBUG_LOOKAHEAD = false;
DEBUG_TOKEN_MANAGER = false;
ERROR_REPORTING = true;
JAVA_UNICODE_ESCAPE = false;
UNICODE_INPUT = false;
IGNORE_CASE = false;
USER_TOKEN_MANAGER = false;
USER_CHAR_STREAM = false;
BUILD_PARSER = true;
BUILD_TOKEN_MANAGER = true;
SANITY_CHECK = true;
FORCE_LA_CHECK = false;
}
PARSER_BEGIN(Simple1)
/** Simple brace matcher. */
public class Simple1 {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
Simple1 parser = new Simple1(System.in);
parser.Input();
}
}
PARSER_END(Simple1)
/** Root production. */
void Input() :
{}
{
MatchedBraces() ("\n"|"\r")* <EOF>
}
/** Brace matching production. */
void MatchedBraces() :
{}
{
"{" [ MatchedBraces() ] "}"
}
测试步骤
1、通过javacc命令生产一群java文件,该文件可以进行转换和词法分析
javacc Simple1.jj
2、编译java文件
javac *.java
3、执行词法转换器parser
java Simple1
测试案例
% java Simple1
{{}}<return>
<control-d>
%
% java Simple1
{x<return>
Lexical error at line 1, column 2. Encountered: "x"
TokenMgrError: Lexical error at line 1, column 2. Encountered: "x" (120), after : ""
at Simple1TokenManager.getNextToken(Simple1TokenManager.java:146)
at Simple1.getToken(Simple1.java:140)
at Simple1.MatchedBraces(Simple1.java:51)
at Simple1.Input(Simple1.java:10)
at Simple1.main(Simple1.java:6)
%
% java Simple1
{}}<return>
ParseException: Encountered "}" at line 1, column 3.
Was expecting one of:
<EOF>
"\n" ...
"\r" ...
at Simple1.generateParseException(Simple1.java:184)
at Simple1.jj_consume_token(Simple1.java:126)
at Simple1.Input(Simple1.java:32)
at Simple1.main(Simple1.java:6)
%
功能介绍
这个是javacc 语法程序,可以匹配左右括号,最后输入0获取多个空行结束程序。
合法的语法例子如下:
“{}”, “{{{{{}}}}}”
非法例子如下:
“{{{{”, “{}{}”, “{}}”, “{{}{}}”, 等等
括号 […]
在JavaCC输入文件中指示…是可选的。
[…]也可以写成(…)?这两种形式是等价的。
可能出现在扩展中的其他结构是:
e1 | e2 | e3 | …:e1,e2,e3等的选择
(e)+:e的一次或多次出现
(e)*:零次或多次出现e
案例2-Simple2.jj
Simple2.jj是对Simple1.jj的一个小修改,允许空格
角色中间插入的角色。 所以然后输入这样的
如:
“{{} \ n} \ n \ n”
现在是合法的。
这个文件和Simple1.jj之间的另一个区别就是这个
文件包含词法规范 - 以…开头的区域
“跳跃”。在这个区域内有4个正则表达式 - 空格,制表符,
换行,并返回。这说明这些常规比赛
表达式将被忽略(并不考虑解析)。于是
只要遇到这4个字符中的任何一个,它们就是
扔掉了。
除了SKIP之外,JavaCC还有其他三个词法规范
区域。这些是:
TOKEN:用于指定词法标记(参见下一个示例)
SPECIAL_TOKEN:用于指定要使用的词法标记
在解析期间被忽略。从这个意义上讲,SPECIAL_TOKEN是
与SKIP相同。但是,这些令牌可以被恢复
在解析器操作中要进行适当处理。
MORE:这指定了部分令牌。完整的令牌是
由一系列MORE组成,后跟一个TOKEN
或SPECIAL_TOKEN。
您可以构建Simple2并使用来自的输入调用生成的解析器
键盘作为标准输入。
javacc -debug_parser Simple2.jj
javac Simple2*.java
java Simple2
javacc -debug_token_manager Simple2.jj
javac Simple2*.java
java Simple2
请注意,debug_token_manager
调试会产生大量诊断信息
信息,它通常用于查看单个调试跟踪
一次令牌。
Simple2.jj
文件内容如下:
/* Copyright (c) 2006, Sun Microsystems, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the Sun Microsystems, Inc. nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
PARSER_BEGIN(Simple2)
/** Simple brace matcher. */
public class Simple2 {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
Simple2 parser = new Simple2(System.in);
parser.Input();
}
}
PARSER_END(Simple2)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
/** Root production. */
void Input() :
{}
{
MatchedBraces() <EOF>
}
/** Brace matching production. */
void MatchedBraces() :
{}
{
"{" [ MatchedBraces() ] "}"
}
案例3-Simple3.jj
Simple3.jj是我们匹配括号的第三个也是最终版本探测器。 此示例说明了TOKEN区域的用法指定词法标记。 在这种情况下,“{”和“}”被定义为代币和名称分别为LBRACE和RBRACE。 这些标签然后可以在尖括号内使用(如示例中所示)来引用这个标记。 通常使用这种令牌规范复杂的标记,如标识符和文字。 令牌是简单的字符串保留原样(在前面的例子中)。
此示例还说明了语法中的操作的使用制作。 此示例中插入的操作计算数量匹配括号。 注意使用声明区域来声明变量“count”和“nested_count”。 另请注意非终端如何“MatchedBraces”将其值作为函数返回值返回。
/* Copyright (c) 2006, Sun Microsystems, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the Sun Microsystems, Inc. nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
PARSER_BEGIN(Simple3)
/** Simple brace matcher. */
public class Simple3 {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
Simple3 parser = new Simple3(System.in);
parser.Input();
}
}
PARSER_END(Simple3)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN :
{
<LBRACE: "{">
| <RBRACE: "}">
}
/** Root production. */
void Input() :
{ int count; }
{
count=MatchedBraces() <EOF>
{ System.out.println("The levels of nesting is " + count); }
}
/** Brace counting production. */
int MatchedBraces() :
{ int nested_count=0; }
{
<LBRACE> [ nested_count=MatchedBraces() ] <RBRACE>
{ return ++nested_count; }
}
案例4-IdList.jj
此示例说明了SKIP的一个重要属性规格。需要注意的要点是正则表达式在SKIP规范中,只有在Token之间忽略而不是
between tokens。该语法接受任何标识符序列中间有空白区域。
该语法的合法输入是:
“abc xyz123 A B C \ t \ n aaa”
这是因为允许任意数量的SKIP正则表达式在连续之间。但是,以下不合法输入:
“xyz 123”
这是因为“xyz”之后的空格字符在SKIP中类别因此导致一个标记结束而另一个标记开始。这要求“123”是单独的标记,因此不匹配语法。
如果中的空格正常,那么所有人必须做的就是替换Id的定义为:
TOKEN:
{
<Id:[“a” - “z”,“A” - “Z”]((“”)* [“a” - “z”,“A” - “Z”,“0” - “9” ])*>
}
请注意,在TOKEN规范中包含空格字符并不意味着空格字符不能在SKIP中使用规格。所有这一切都意味着任何空间角色
出现在可以放在标识符中的上下文中将参加的比赛,而所有其他空间字符将被忽略。匹配算法的细节是在网页的JavaCC文档中描述。
作为必然结果,必须将令牌定义为其中的任何内容不得出现空白字符等字符。在里面如上所示,如果被定义为语法生成而不是如下所示的词汇标记,然后是“xyz 123”已被公认为合法(错误地)。
void Id():
{}
{
<[“a” - “z”,“A” - “Z”]>(<[“” - “z”,“A” - “Z”,“0” - “9”]>)*
}
注意,在上述非终端Id的定义中,它由一系列单个字符标记(注意<…> s的位置),因此在这些角色之间允许有空格。
/* Copyright (c) 2006, Sun Microsystems, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the Sun Microsystems, Inc. nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
PARSER_BEGIN(IdList)
/** ID lister. */
public class IdList {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
IdList parser = new IdList(System.in);
parser.Input();
}
}
PARSER_END(IdList)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN :
{
< Id: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"] )* >
}
/** Top level production. */
void Input() :
{}
{
( <Id> )+ <EOF>
}
案例5-NL_Xlator.jj
这个例子详细介绍了编写正则表达式JavaCC语法文件。它还说明了一个稍微复杂的集合转换语法描述的表达式的动作
英文
上面例子中的新概念是使用更复杂的常用表达。正则表达式:
<ID:[“a” - “z”,“A” - “Z”,“”]([“a” - “z”,“A” - “Z”,“”,“0” - “9”])*>
创建一个名为ID的新正则表达式。这可以在语法中的任何其他地方简单地称为。接下来是什么方括号是一组允许的字符 - 在这种情况下它是任何大写或小写字母或下划线。这是然后是0或更多次出现的任何大写或小写
字母,数字或下划线。
可能出现在正则表达式中的其他构造是:
(…)+:一次或多次…
(…)? :可选的出现…(注意在这种情况下
词汇标记,(…)?和[…]不等同)
(r1 | r2 | …):r1,r2中的任何一个,…
形式[…]的构造是一个与之匹配的模式在…中指定的字符。这些角色可以是个人的字符或字符范围。在该构造之前的“〜”是a匹配任何未在…中指定的字符的模式。因此:
[“a” - “z”]匹配所有小写字母
〜[]匹配任何字符
〜[“\ n”,“\ r”]匹配除新行字符以外的任何字符
在扩展中使用正则表达式时,它的值为键入“令牌(Token)”。这将生成到生成的解析器目录中作为“Token.java”。在上面的例子中,我们定义了一个变量键入“Token”并为其分配正则表达式的值。
/* Copyright (c) 2006, Sun Microsystems, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the Sun Microsystems, Inc. nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
PARSER_BEGIN(NL_Xlator)
/** New line translator. */
public class NL_Xlator {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
NL_Xlator parser = new NL_Xlator(System.in);
parser.ExpressionList();
}
}
PARSER_END(NL_Xlator)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN :
{
< ID: ["a"-"z","A"-"Z","_"] ( ["a"-"z","A"-"Z","_","0"-"9"] )* >
|
< NUM: ( ["0"-"9"] )+ >
}
/** Top level production. */
void ExpressionList() :
{
String s;
}
{
{
System.out.println("Please type in an expression followed by a \";\" or ^D to quit:");
System.out.println("");
}
( s=Expression() ";"
{
System.out.println(s);
System.out.println("");
System.out.println("Please type in another expression followed by a \";\" or ^D to quit:");
System.out.println("");
}
)*
<EOF>
}
/** An Expression. */
String Expression() :
{
java.util.Vector termimage = new java.util.Vector();
String s;
}
{
s=Term()
{
termimage.addElement(s);
}
( "+" s=Term()
{
termimage.addElement(s);
}
)*
{
if (termimage.size() == 1) {
return (String)termimage.elementAt(0);
} else {
s = "the sum of " + (String)termimage.elementAt(0);
for (int i = 1; i < termimage.size()-1; i++) {
s += ", " + (String)termimage.elementAt(i);
}
if (termimage.size() > 2) {
s += ",";
}
s += " and " + (String)termimage.elementAt(termimage.size()-1);
return s;
}
}
}
/** A Term. */
String Term() :
{
java.util.Vector factorimage = new java.util.Vector();
String s;
}
{
s=Factor()
{
factorimage.addElement(s);
}
( "*" s=Factor()
{
factorimage.addElement(s);
}
)*
{
if (factorimage.size() == 1) {
return (String)factorimage.elementAt(0);
} else {
s = "the product of " + (String)factorimage.elementAt(0);
for (int i = 1; i < factorimage.size()-1; i++) {
s += ", " + (String)factorimage.elementAt(i);
}
if (factorimage.size() > 2) {
s += ",";
}
s += " and " + (String)factorimage.elementAt(factorimage.size()-1);
return s;
}
}
}
/** A Factor. */
String Factor() :
{
Token t;
String s;
}
{
t=<ID>
{
return t.image;
}
|
t=<NUM>
{
return t.image;
}
|
"(" s=Expression() ")"
{
return s;
}
}