上下文无关输入与输出java_正则表达式与上下文无关文法

贺泳

于 2021-02-27 16:31:01 发布

阅读量316

点赞数

文章标签：上下文无关输入与输出java

本文链接：https://blog.csdn.net/weixin_42434656/article/details/114823600

版权

正则表达式

正则表达式在日常开发中时不时都会遇到，我们先来看看正则表达式( Regular Expression)的定义(参考龙书英文第2版121页)：

ε是一个正则表达式，它生成的语言L(ε)等价于{ε}，即L(ε)={ε}，就是一个空字符串

如果a属于符号集Σ，那么a也是一个正则表达式，且其生成的语言L(a)={a}，就是一个a字符

如果r和s都是正则表达式，那么r | s也是正则表达式，L(r | s) = L(r) ∪ L(s)

如果r和s都是正则表达式，那么r s也是正则表达式，L(r s) = L(r) L(s)

如果r是正则表达式，则r*也是正则表达式，L(r*)=(L(r))*

如果r是正则表达式，则(r)也是正则表达式，L((r))=L(r)

来看看正则表达式的一些简单例子，假设字符表Σ是全部的小写字母(a～z)以及阿拉伯数字(0～9)，则：

abc

a|b

ab|bc

(ab)*

(a|b)(0|1)*

都是正则表达式。

正则定义

有时一个正则表达式很复杂，我们希望能够把这个正则表达式用记号记下来，以便以后使用，于是就有了正则定义(Regular Definitions)(参考龙书英文第2版123页)：

设Σ是符号表，如果有一系列的定义：

d1 → r1

d2→ r2

...

dn→ rn

并且：

任一个di都是一个新的符号，它既不属于Σ，也不属于{d1,d2,d3,...,d(i-1)}

任一个ri都是一个正则表达式，能够由符号表Σ以及{d1,d2,d3,...,d(i-1)}所表示

这样就构成了一组正则表达式定义。

上下文无关文法

上下文无关文法(Context-Free Grammar)的定义参考龙书英文第2版197页：

终结符号(Terminals)

非终结符号(Nonterminals)

一个非终结符号作为开始符号

一组产生式

稍详细的内容可见我的另一篇博文到底什么是上下文无关文法？

正则定义与上下文无关文法的区别

正则定义与上下文无关文法的重要区别在于，在正则定义中是不允许递归定义的，例如A → aA|b不是一个正则定义，为其左边的A必须是一个新的符号，也就是说不能在其他地方定义过，胆识其右边要求每一个符号都是定义过的，因此这个定义无法满足。而上下文无关文法则没有这个约束，因此A → aA|b是一个上下文无关文法的产生式，但不是正则定义的定义式。

看过一段话，意思是正则表达式在编译器构建中一般用来进行词法分析，通过NFA、DFA就可以识别，而更复杂的文法就需要以来其他算法了。

Regular expressions and BNF are two grammatical formalisms for describing sets of strings. Regular expressions are a concise and convenient notation for describing syntax when "nesting" is not an issue. BNF is a more powerful notation that allows for the description of nested language constructs using nonterminal symbols in arbitrary recursive combinations. Thus regular expressions are appropriate for token-level syntax of programming languages, while BNF is required for the higher-level recursive syntax of expressions, statements and so on.

原文请看 http://www.cs.sfu.ca/~cameron/Teaching/D-Lib/RegExp.html

运用正则定义的要求，我们可以从上下文无关文法中，筛选出符合正则定义的产生式，这些符合正则定义的产生式，就可以用来生成有限状态自动机。

下面的类实现了从上下文无关文法产生式中筛选处正则定义式：

This file is one of the component a Context-free Grammar Parser Generator,

which accept a piece of text as the input, and generates a parser

for the inputted context-free grammar.

This program is free software: you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation, either version 3 of the License, or

any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

You should have received a copy of the GNU General Public License

along with this program. If not, see .

package analyzer;

import java.io.IOException;

import java.util.ArrayList;

import java.util.HashSet;

import java.util.List;

import java.util.Set;

import abnf.Rule;

import abnf.RuleName;

public class RegularAnalyzer {

private List nonRegularRules = new ArrayList();

public List getNonRegularRules() { return nonRegularRules; }

private List regularRules = new ArrayList();

public List getRegularRules() { return regularRules; }

private List undefinedRules = new ArrayList();

public List getUndefinedRules() { return undefinedRules; }

public RegularAnalyzer(List rules) {

Set definedRuleNames = new HashSet();

List observedRules = new ArrayList();

observedRules.addAll(rules);

boolean foundRegular;

do {

foundRegular = false;

for(int index = observedRules.size() - 1; index >= 0; index --) {

Set dependent = observedRules.get(index).getElements().getDependentRuleNames();

if (definedRuleNames.containsAll(dependent)) {

definedRuleNames.add(observedRules.get(index).getRuleName());

regularRules.add(observedRules.get(index));

observedRules.remove(index);

foundRegular = true;

continue;

}

if (!dependent.contains(observedRules.get(index).getRuleName())) {

continue;

}

dependent.remove(observedRules.get(index).getRuleName());

if (definedRuleNames.containsAll(dependent)) {

definedRuleNames.add(observedRules.get(index).getRuleName());

nonRegularRules.add(observedRules.get(index));

observedRules.remove(index);

foundRegular = true;

}

} while (foundRegular);

undefinedRules.addAll(observedRules);

observedRules.clear();

}

贺泳

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
上下文无关输入与输出java_正则表达式与上下文无关文法

正则表达式正则表达式在日常开发中时不时都会遇到，我们先来看看正则表达式( Regular Expression)的定义(参考龙书英文第2版121页)：ε是一个正则表达式，它生成的语言L(ε)等价于{ε}，即L(ε)={ε}，就是一个空字符串如果a属于符号集Σ，那么a也是一个正则表达式，且其生成的语言L(a)={a}，就是一个a字符如果r和s都是正则表达式，那么r | s也是正则表达式，L(r | ...
复制链接

扫一扫