我的输入很简单:$input = '( ( "M" AND ( "(" OR "AND" ) ) OR "T" )';
其中(在树上开始一个新节点并)结束它。and和or单词是为布尔运算保留的,因此在它们不在“”标记内之前,它们有一个特殊的含义。在我的dsl中,and和or子句按节点级别变化,因此在级别上只能有and或or子句。如果和在或之后,它将启动一个新的子节点。“”中的所有字符都应视为它们本身。最后像往常一样“可以用”逃走。
在php中,有什么好的方法可以使翻译句子看起来像这样:
$output = array(array(array("M" , array("(", "AND")) , "T"), FALSE);
请注意,false是一个指示器,表示根级别具有或关键字。如果输入是:
( ( "M" AND ( "(" OR "AND" ) ) AND "T" )
那么输出将是:
$output = array(array(array("M", array("(", "AND")), "T"), TRUE);
使用replace('('、'array(');和eval代码是很有诱惑力的,但是转义字符和包装文字将成为一个问题。
目前我还没有在dsl上实现not boolean运算符。
谢谢你的帮助。javasript代码也可以。
Python示例:
在使用php和javascript之前,我用python做了一些测试。我所做的是:
使用正则表达式查找字符串文本
用生成的键替换文本
将文字存储到关联列表
用括号将输入拆分为单级列表
查找根级布尔运算符
去掉布尔运算符和空白
用存储的值替换文本键
它可能有用,但我相信一定有更复杂的方法来做到这一点。
http://codepad.org/PdgQLviI
最佳答案
这是我的辅助项目的库修改。它应该处理这些字符串-执行一些压力测试,让我知道它是否在某处断裂。
需要使用tokenizer类型类来提取和标记变量,这样它们就不会干扰语法分析和tokenize parethesis,这样它们就可以直接匹配(lazy-计算内容不会捕获嵌套级别,greedy将覆盖同一级别上的所有上下文)。它也有一些关键字语法(比需要的稍微多一些,因为它只会被解析为根级别)。当试图用错误的键访问变量注册表时抛出InvalidArgumentException,当括号不匹配时抛出RuntimeException。class TokenizedInput
{
const VAR_REGEXP = '\"(?P.*?)\"';
const BLOCK_OPEN_REGEXP = '\(';
const BLOCK_CLOSE_REGEXP = '\)';
const KEYWORD_REGEXP = '(?OR|AND)';
// Token: $id
const TOKEN_DELIM_LEFT = '
const TOKEN_DELIM_RIGHT = '>';
const VAR_TOKEN = 'VAR';
const KEYWORD_TOKEN = 'KEYWORD';
const BLOCK_OPEN_TOKEN = 'BLOCK';
const BLOCK_CLOSE_TOKEN = 'ENDBLOCK';
const ID_DELIM = ':';
const ID_REGEXP = '[0-9]+';
private $original;
private $tokenized;
private $data = [];
private $blockLevel = 0;
private $varTokenId = 0;
protected $procedure = [
'varTokens' => self::VAR_REGEXP,
'keywordToken' => self::KEYWORD_REGEXP,
'blockTokens' => '(?P' . self::BLOCK_OPEN_REGEXP . ')|(?P' . self::BLOCK_CLOSE_REGEXP . ')'
];
private $tokenMatch;
public function __construct($input) {
$this->original = (string) $input;
}
public function string() {
isset($this->tokenized) or $this->tokenize();
return $this->tokenized;
}
public function variable($key) {
isset($this->tokenized) or $this->tokenize();
if (!isset($this->data[$key])) {
throw new InvalidArgumentException("Variable id:($key) does not exist.");
}
return $this->data[$key];
}
public function tokenSearchRegexp() {
if (!isset($this->tokenMatch)) {
$strings = $this->stringSearchRegexp();
$blocks = $this->blockSearchRegexp();
$this->tokenMatch = '#(?:' . $strings . '|' . $blocks . ')#';
}
return $this->tokenMatch;
}
public function stringSearchRegexp($id = null) {
$id = $id ?: self::ID_REGEXP;
return preg_quote(self::TOKEN_DELIM_LEFT . self::VAR_TOKEN . self::ID_DELIM)
. '(?P' . $id . ')'
. preg_quote(self::TOKEN_DELIM_RIGHT);
}
public function blockSearchRegexp($level = null) {
$level = $level ?: self::ID_REGEXP;
$block_open = preg_quote(self::TOKEN_DELIM_LEFT . self::BLOCK_OPEN_TOKEN . self::ID_DELIM)
. '(?P' . $level . ')'
. preg_quote(self::TOKEN_DELIM_RIGHT);
$block_close = preg_quote(self::TOKEN_DELIM_LEFT . self::BLOCK_CLOSE_TOKEN . self::ID_DELIM)
. '\k'
. preg_quote(self::TOKEN_DELIM_RIGHT);
return $block_open . '(?P.*)' . $block_close;
}
public function keywordSearchRegexp($keyword = null) {
$keyword = $keyword ? '(?P' . $keyword . ')' : self::KEYWORD_REGEXP;
return preg_quote(self::TOKEN_DELIM_LEFT . self::KEYWORD_TOKEN . self::ID_DELIM)
. $keyword
. preg_quote(self::TOKEN_DELIM_RIGHT);
}
private function tokenize() {
$current = $this->original;
foreach ($this->procedure as $method => $pattern) {
$current = preg_replace_callback('#(?:' . $pattern . ')#', [$this, $method], $current);
}
if ($this->blockLevel) {
throw new RuntimeException("Syntax error. Parenthesis mismatch." . $this->blockLevel);
}
$this->tokenized = $current;
}
protected function blockTokens($match) {
if (isset($match['close'])) {
$token = self::BLOCK_CLOSE_TOKEN . self::ID_DELIM . --$this->blockLevel;
} else {
$token = self::BLOCK_OPEN_TOKEN . self::ID_DELIM . $this->blockLevel++;
}
return $this->addDelimiters($token);
}
protected function varTokens($match) {
$this->data[$this->varTokenId] = $match[1];
return $this->addDelimiters(self::VAR_TOKEN . self::ID_DELIM . $this->varTokenId++);
}
protected function keywordToken($match) {
return $this->addDelimiters(self::KEYWORD_TOKEN . self::ID_DELIM . $match[1]);
}
private function addDelimiters($token) {
return self::TOKEN_DELIM_LEFT . $token . self::TOKEN_DELIM_RIGHT;
}
}
parser type类对标记化字符串执行匹配-提取已注册的变量,并通过clonig本身递归进入嵌套上下文。
运算符类型处理是不寻常的,这使得它更像是一个派生类,但无论如何,在解析器的世界中很难实现令人满意的抽象。
class ParsedInput
{
private $input;
private $result;
private $context;
public function __construct(TokenizedInput $input) {
$this->input = $input;
}
public function result() {
if (isset($this->result)) { return $this->result; }
$this->parse($this->input->string());
$this->addOperator();
return $this->result;
}
private function parse($string, $context = 'root') {
$this->context = $context;
preg_replace_callback(
$this->input->tokenSearchRegexp(),
[$this, 'buildStructure'],
$string
);
return $this->result;
}
protected function buildStructure($match) {
if (isset($match['contents'])) { $this->parseBlock($match['contents'], $match['level']); }
elseif (isset($match['id'])) { $this->parseVar($match['id']); }
}
protected function parseVar($id) {
$this->result[] = $this->input->variable((int) $id);
}
protected function parseBlock($contents, $level) {
$nested = clone $this;
$this->result[] = $nested->parse($contents, (int) $level);
}
protected function addOperator() {
$subBlocks = '#' . $this->input->blockSearchRegexp(1) . '#';
$rootLevel = preg_replace($subBlocks, '', $this->input->string());
$rootKeyword = '#' . $this->input->keywordSearchRegexp('AND') . '#';
return $this->result[] = (preg_match($rootKeyword, $rootLevel) === 1);
}
public function __clone() {
$this->result = [];
}
}
示例用法:
$input = '( ( "M" AND ( "(" OR "AND" ) ) AND "T" )';
$tokenized = new TokenizedInput($input);
$parsed = new ParsedInput($tokenized);
$result = $parsed->result();
我删除了名称空间/imports/intrefaces,所以您可以根据需要调整它们。也不想翻阅(现在可能无效)评论,所以也删除了它们。