编写php需要什么编译器_PHP中的ReactJS：编写编译器既简单又有趣！

最新推荐文章于 2024-04-07 20:36:58 发布

culh2177

最新推荐文章于 2024-04-07 20:36:58 发布

阅读量421

点赞数

文章标签：字符串 python java 编程语言 php

原文链接：https://www.sitepoint.com/reactjs-php-writing-compilers-easy-fun/

版权

编写php需要什么编译器

I used to use an extension called XHP. It enables HTML-in-PHP syntax for generating front-end markup. I reached for it recently, and was surprised to find that it was no longer officially supported for modern PHP versions.

我以前使用的扩展名为XHP。它启用HTML-in-PHP语法来生成前端标记。我最近接触了它，很惊讶地发现现代PHP版本不再正式支持它。

So, I decided to implement a user-land version of it, using a basic state-machine compiler. It seemed like it would be a fun project to do with you!

因此，我决定使用基本的状态机编译器来实现它的用户版本。与您合作似乎是一个有趣的项目！

The code for this tutorial can be found on Github.

可以在Github上找到本教程的代码。

Abstract image of blocks coming together

创建编译器 (Creating Compilers)

Many developers avoid writing their own compilers or interpreters, thinking that the topic is too complex or difficult to explore properly. I used to feel like that too. Compilers can be difficult to make well, and the topic can be incredibly complex and difficult. But, that doesn’t mean you can’t make a compiler.

许多开发人员避免编写自己的编译器或解释器，以为该主题过于复杂或难以正确探索。我以前也有这种感觉。编译器很难精打细算，主题可能非常复杂和困难。但是，这并不意味着您不能创建编译器。

Making a compiler is like making a sandwich. Anyone can get the ingredients and put it together. You can make a sandwich. You can also go to chef school and learn how to make the best damn sandwich the world has ever seen. You can study the art of sandwich making for years, and people can talk about your sandwiches in other lands. You’re not going to let the breadth and complexity of sandwich-making prevent you from making your first sandwich, are you?

制作编译器就像制作三明治。任何人都可以拿到食材并将其放在一起。你可以做一个三明治。您还可以去厨师学校学习如何制作世界上最美味的三明治。您可以研究三明治的制作艺术多年，人们可以在其他地方谈论您的三明治。您不会让三明治制作的广度和复杂性妨碍您制作第一个三明治，是吗？

Compilers (and interpreters) begin with humble string manipulation and temporary variables. When they’re sufficiently popular (or sufficiently slow) then the experts can step in; to replace the string manipulation and temporary variables with unicorn tears and cynicism.

编译器(和解释器)从谦虚的字符串操作和临时变量开始。当它们足够受欢迎(或足够缓慢)时，专家可以介入；用独角兽的眼泪和玩世不恭的态度代替字符串操作和临时变量。

At a fundamental level, compilers take a string of code and run it through a couple of steps:

从根本上讲，编译器采用一串代码，并通过以下几个步骤运行它：

The code is split into tokens – meaningful characters and sub-strings – which the compiler will use to derive meaning. The statement if (isEmergency) alert("there is an emergency") could be considered to contain tokens like if, isEmergency, alert, and "there is an emergency"; and these all mean something to the compiler.

该代码分为令牌-有意义的字符和子字符串-编译器将使用这些令牌来导出含义。 if (isEmergency) alert("there is an emergency")语句可以被认为包含诸如if ， isEmergency ， alert和"there is an emergency"类的标记；这些都对编译器有意义。

The first step is to split the entire source code up into these meaningful bits, so that the compiler can start to organize them in a logical hierarchy, so it knows what to do with the code.

第一步是将整个源代码分成这些有意义的位，以便编译器可以开始按逻辑层次结构组织它们，从而知道如何处理这些代码。
The tokens are arranged into the logical hierarchy (sometimes called an Abstract Syntax Tree) which represents what needs to be done in the program. The previous statement could be understood as “Work out if the condition (isEmergency) evaluates to true. If it does, run the function (alert) with the parameter ("there is an emergency")”.

令牌被排列到逻辑层次结构中(有时称为抽象语法树)，该逻辑层次结构表示程序中需要完成的工作。前面的语句可以理解为“如果条件( isEmergency )计算为true，请进行计算。如果是这样，请运行带有参数( "there is an emergency" )的功能( alert )。

Using this hierarchy, the code can be immediately executed (in the case of an interpreter or virtual machine) or translated into other languages (in the case of languages like CoffeeScript and TypeScript, which are both compile-to-Javascript languages).

使用此层次结构，可以立即执行代码(对于解释器或虚拟机)，也可以将其翻译成其他语言(对于像CoffeeScript和TypeScript一样都是编译为Java语言的语言)。

In our case, we want to maintain most of the PHP syntax, but we also want to add our own little bit of syntax on top. We could create a whole new interpreter…or we could preprocess the new syntax, compiling it to syntactically valid PHP code.

在我们的案例中，我们希望保留大多数PHP语法，但是我们也想在上面添加自己的一点语法。我们可以创建一个全新的解释器……或者可以预处理新语法，将其编译为语法上有效PHP代码。

I’ve written about preprocessing PHP before, and it’s my favorite approach to adding new syntax. In this case, we need to write a more complex script; so we’re going to deviate from how we’ve previously added new syntax.

我之前曾经写过有关PHP预处理的文章，这是添加新语法的最喜欢的方法。在这种情况下，我们需要编写一个更复杂的脚本。因此，我们将偏离先前添加新语法的方式。

生成令牌 (Generating Tokens)

Let’s create a function to split code into tokens. It begins like this:

让我们创建一个将代码拆分为令牌的函数。它是这样开始的：

function tokens($code) {
    $tokens = [];

    $length = strlen($code);
    $cursor = 0;

    while ($cursor < $length) {
        if ($code[$cursor] === "{") {
            print "ATTRIBUTE STARTED ({$cursor})" . PHP_EOL;
        }

        if ($code[$cursor] === "}") {
            print "ATTRIBUTE ENDED ({$cursor})" . PHP_EOL;
        }

        if ($code[$cursor] === "<") {
            print "ELEMENT STARTED ({$cursor})" . PHP_EOL;
        }

        if ($code[$cursor] === ">") {
            print "ELEMENT ENDED ({$cursor})" . PHP_EOL;
        }

        $cursor++;
    }
}

$code = '
    <?php

    $classNames = "foo bar";
    $message = "hello world";

    $thing = (
        <div
            className={() => { return "outer-div"; }}
            nested={<span className={"nested-span"}>with text</span>}
        >
            a bit of text before
            <span>
                {$message} with a bit of extra text
            </span>
            a bit of text after
        </div>
    );
';

tokens($code);

// ELEMENT STARTED (5)
// ELEMENT STARTED (95)
// ATTRIBUTE STARTED (122)
// ELEMENT ENDED (127)
// ATTRIBUTE STARTED (129)
// ATTRIBUTE ENDED (151)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173)
// ELEMENT STARTED (174)
// ATTRIBUTE STARTED (190)
// ATTRIBUTE ENDED (204)
// ELEMENT ENDED (205)
// ELEMENT STARTED (215)
// ELEMENT ENDED (221)
// ATTRIBUTE ENDED (222)
// ELEMENT ENDED (232)
// ELEMENT STARTED (279)
// ELEMENT ENDED (284)
// ATTRIBUTE STARTED (302)
// ATTRIBUTE ENDED (311)
// ELEMENT STARTED (350)
// ELEMENT ENDED (356)
// ELEMENT STARTED (398)
// ELEMENT ENDED (403)

This is from tokens-1.php

这是来自tokens-1.php

We’re off to a good start. By stepping through the code, we can check to see what each character is (and identify the ones that matter to us). We’re seeing, for instance, that the first element opens when we encounter a < character, at index 5. The first element closes at index 210.

我们有一个良好的开端。通过逐步检查代码，我们可以检查每个字符是什么(并确定对我们重要的字符)。例如，我们看到第一个元素在索引5遇到<字符时打开。第一个元素在索引210处关闭。

Unfortunately, that first opening is being incorrectly matched to <?php. That’s not an element in our new syntax, so we have to stop the code from picking it out:

不幸的是，第一次打开与<?php不匹配。这不是我们新语法中的元素，因此我们必须停止挑选代码：

preg_match("#^</?[a-zA-Z]#", substr($code, $cursor, 3), $matchesStart);

if (count($matchesStart)) {
    print "ELEMENT STARTED ({$cursor})" . PHP_EOL;
}

// ...

// ELEMENT STARTED (95)
// ATTRIBUTE STARTED (122)
// ELEMENT ENDED (127)
// ATTRIBUTE STARTED (129)
// ATTRIBUTE ENDED (151)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173)
// ELEMENT STARTED (174)
// ...

This is from tokens-2.php

这是来自tokens-2.php

Instead of checking only the current character, our new code checks three characters: if they match the pattern <div or </div, but not <?php or $num1 < $num2.

我们的新代码不仅检查当前字符，还检查三个字符：如果它们匹配模式<div或</div ，但不匹配<?php或$num1 < $num2 。

There’s another problem: our example uses arrow function syntax, so => is being matched as an element closing sequence. Let’s refine how we match element closing sequences:

还有另一个问题：我们的示例使用箭头函数语法，因此=>被作为元素关闭序列进行匹配。让我们完善一下如何匹配元素关闭序列：

preg_match("#^=>#", substr($code, $cursor - 1, 2), $matchesEqualBefore);
preg_match("#^>=#", substr($code, $cursor, 2), $matchesEqualAfter);

if ($code[$cursor] === ">" && !$matchesEqualBefore && !$matchesEqualAfter) {
    print "ELEMENT ENDED ({$cursor})" . PHP_EOL;
}

// ...

// ELEMENT STARTED (95)
// ATTRIBUTE STARTED (122)
// ATTRIBUTE STARTED (129)
// ATTRIBUTE ENDED (151)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173)
// ELEMENT STARTED (174)
// ...

This is from tokens-3.php

这是来自tokens-3.php

As with JSX, it would be good for attributes to allow dynamic values (even if those values are nested JSX elements). There are a few ways we could do this, but the one I prefer is to treat all attributes as text, and tokenize them recursively. To do this, we need to have a kind of state machine which tracks how many levels deep we are in an element and attribute. If we’re inside an element tag, we should trap the top level {…} as a string attribute value, and ignore subsequent braces. Similarly, if we’re inside an attribute, we should ignore nested element opening and closing sequences:

与JSX一样，属性允许动态值(即使这些值是嵌套的JSX元素)也是很好的。我们有几种方法可以做到这一点，但是我更喜欢将所有属性视为文本，然后递归地对其进行标记。为此，我们需要一种状态机，该状态机可以跟踪元素和属性中有多少个层次。如果在元素标签内，则应将顶级{…}捕获为字符串属性值，并忽略后续的花括号。同样，如果我们在属性内，则应忽略嵌套元素的打开和关闭序列：

function tokens($code) {
    $tokens = [];

    $length = strlen($code);
    $cursor = 0;

    $elementLevel = 0;
    $elementStarted = null;
    $elementEnded = null;

    $attributes = [];
    $attributeLevel = 0;
    $attributeStarted = null;
    $attributeEnded = null;

    while ($cursor < $length) {
        $extract = trim(substr($code, $cursor, 5)) . "...";

        if ($code[$cursor] === "{" && $elementStarted !== null) {
            if ($attributeLevel === 0) {
                print "ATTRIBUTE STARTED ({$cursor}, {$extract})" . PHP_EOL;
                $attributeStarted = $cursor;
            }

            $attributeLevel++;
        }

        if ($code[$cursor] === "}" && $elementStarted !== null) {
            $attributeLevel--;

            if ($attributeLevel === 0) {
                print "ATTRIBUTE ENDED ({$cursor})" . PHP_EOL;
                $attributeEnded = $cursor;
            }
        }

        preg_match("#^</?[a-zA-Z]#", substr($code, $cursor, 3), $matchesStart);

        if (count($matchesStart) && $attributeLevel < 1) {
            print "ELEMENT STARTED ({$cursor}, {$extract})" . PHP_EOL;

            $elementLevel++;
            $elementStarted = $cursor;
        }

        preg_match("#^=>#", substr($code, $cursor - 1, 2), $matchesEqualBefore);
        preg_match("#^>=#", substr($code, $cursor, 2), $matchesEqualAfter);

        if (
            $code[$cursor] === ">"
            && !$matchesEqualBefore && !$matchesEqualAfter
            && $attributeLevel < 1
        ) {
            print "ELEMENT ENDED ({$cursor})" . PHP_EOL;

            $elementLevel--;
            $elementEnded = $cursor;
        }

        if ($elementStarted && $elementEnded) {
            // TODO

            $elementStarted = null;
            $elementEnded = null;
        }

        $cursor++;
    }
}

// ...

// ELEMENT STARTED (95, <div...)
// ATTRIBUTE STARTED (122, {() =...)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173, {<spa...)
// ATTRIBUTE ENDED (222)
// ELEMENT ENDED (232)
// ELEMENT STARTED (279, <span...)
// ELEMENT ENDED (284)
// ELEMENT STARTED (350, </spa...)
// ELEMENT ENDED (356)
// ELEMENT STARTED (398, </div...)
// ELEMENT ENDED (403)

This is from tokens-4.php

这是来自tokens-4.php

We’ve added new $attributeLevel, $attributeStarted, and $attributeEnded variables; to track how deep we are in the nesting of attributes, and where the top-level starts and ends. Specifically, if we’re at the top level when an attribute’s value starts or ends, we capture the current cursor position. Later, we’ll use this to extract the string attribute value and replace it with a placeholder.

我们添加了新的$attributeLevel ， $attributeStarted和$attributeEnded变量；跟踪我们在属性嵌套中的深度，以及顶层的开始和结束位置。具体来说，如果我们在属性值的开始或结束处在顶层，则将捕获当前光标位置。稍后，我们将使用它来提取字符串属性值并将其替换为占位符。

We’re also starting to capture $elementStarted and $elementEnded (with $elementLevel fulfilling a similar role to $attributeLevel) so that we can capture a full element opening or closing tag. In this case, $elementEnded doesn’t refer to the closing tag but rather the closing sequence of characters of the opening tag. Closing tags are treated as entirely separate tokens…

我们还开始捕获$elementStarted和$elementEnded (其中$elementLevel扮演的角色与$attributeLevel相似)，以便我们可以捕获完整的元素开始或结束标记。在这种情况下， $elementEnded不是引用结束标记，而是引用开始标记的字符的结束序列。关闭标签被视为完全独立的令牌…

After extracting a small substring after the current cursor position, we can see elements and attributes starting and ending exactly where we expect. The nested control structures and elements are captured as strings, leaving only the top-level elements, non-attribute nested elements, and attribute values.

在当前光标位置之后提取一个小的子字符串之后，我们可以看到元素和属性恰好在期望的位置开始和结束。嵌套的控件结构和元素被捕获为字符串，仅保留顶级元素，非属性嵌套元素和属性值。

Let’s package these tokens up, associating attributes with the tags in which they are defined:

让我们打包这些令牌，将属性与定义它们的标签相关联：

function tokens($code) {
    $tokens = [];

    $length = strlen($code);
    $cursor = 0;

    $elementLevel = 0;
    $elementStarted = null;
    $elementEnded = null;

    $attributes = [];
    $attributeLevel = 0;
    $attributeStarted = null;
    $attributeEnded = null;

    $carry = 0;

    while ($cursor < $length) {
        if ($code[$cursor] === "{" && $elementStarted !== null) {
            if ($attributeLevel === 0) {
                $attributeStarted = $cursor;
            }

            $attributeLevel++;
        }

        if ($code[$cursor] === "}" && $elementStarted !== null) {
            $attributeLevel--;

            if ($attributeLevel === 0) {
                $attributeEnded = $cursor;
            }
        }

        if ($attributeStarted && $attributeEnded) {
            $position = (string) count($attributes);
            $positionLength = strlen($position);

            $attribute = substr(
                $code, $attributeStarted + 1, $attributeEnded - $attributeStarted - 1
            );

            $attributes[$position] = $attribute;

            $before = substr($code, 0, $attributeStarted + 1);
            $after = substr($code, $attributeEnded);

            $code = $before . $position . $after;

            $cursor = $attributeStarted + $positionLength + 2 /* curlies */;
            $length = strlen($code);

            $attributeStarted = null;
            $attributeEnded = null;

            continue;
        }

        preg_match("#^</?[a-zA-Z]#", substr($code, $cursor, 3), $matchesStart);

        if (count($matchesStart) && $attributeLevel < 1) {
            $elementLevel++;
            $elementStarted = $cursor;
        }

        preg_match("#^=>#", substr($code, $cursor - 1, 2), $matchesEqualBefore);
        preg_match("#^>=#", substr($code, $cursor, 2), $matchesEqualAfter);

        if (
            $code[$cursor] === ">"
            && !$matchesEqualBefore && !$matchesEqualAfter
            && $attributeLevel < 1
        ) {
            $elementLevel--;
            $elementEnded = $cursor;
        }

        if ($elementStarted !== null && $elementEnded !== null) {
            $distance = $elementEnded - $elementStarted;

            $carry += $cursor;

            $before = trim(substr($code, 0, $elementStarted));
            $tag = trim(substr($code, $elementStarted, $distance + 1));
            $after = trim(substr($code, $elementEnded + 1));

            $token = ["tag" => $tag, "started" => $carry];

            if (count($attributes)) {
                $token["attributes"] = $attributes;
            }

            $tokens[] = $before;
            $tokens[] = $token;

            $attributes = [];

            $code = $after;
            $length = strlen($code);
            $cursor = 0;

            $elementStarted = null;
            $elementEnded = null;

            continue;
        }

        $cursor++;
    }

    return $tokens;
}

$code = '
    <?php

    $classNames = "foo bar";
    $message = "hello world";

    $thing = (
        <div
            className={() => { return "outer-div"; }}
            nested={<span className={"nested-span"}>with text</span>}
        >
            a bit of text before
            <span>
                {$message} with a bit of extra text
            </span>
            a bit of text after
        </div>
    );
';

tokens($code);

// Array
// (
//     [0] => <?php
//
//     $classNames = "foo bar";
//     $message = "hello world";
//
//     $thing = (
//     [1] => Array
//         (
//             [tag] => <div className={0} nested={1}>
//             [started] => 157
//             [attributes] => Array
//                 (
//                     [0] => () => { return "outer-div"; }
//                     [1] => <span className={"nested-span"}>with text</span>
//                 )
//
//         )
//
//     [2] => a bit of text before
//     [3] => Array
//         (
//             [tag] => <span>
//             [started] => 195
//         )
//
//     [4] => {$message} with a bit of extra text
//     [5] => Array
//         (
//             [tag] => </span>
//             [started] => 249
//         )
//
//     [6] => a bit of text after
//     [7] => Array
//         (
//             [tag] => </div>
//             [started] => 282
//         )
//
// )

This is from tokens-5.php

这是来自tokens-5.php

There’s a lot going on here, but it’s all just a natural progression from the previous version. We use the captured attribute start and end positions to extract the entire attribute value as one big string. We then replace each captured attribute with a numeric placeholder and reset the code string and cursor positions.

这里有很多事情要做，但这只是前一个版本的自然发展。我们使用捕获的属性开始和结束位置将整个属性值提取为一个大字符串。然后，我们用数字占位符替换每个捕获的属性，并重置代码字符串和光标位置。

As each element closes, we associate all the attributes since the element was opened, and create a separate array token from the tag (with its placeholders), attributes and starting position. The result may be a little harder to read, but it is spot on in terms of capturing the intent of the code.

当每个元素关闭时，我们将关联自元素打开以来的所有属性，并根据标签(及其占位符)，属性和起始位置创建单独的数组标记。结果可能会更难阅读，但是在捕获代码意图方面是很明显的。

So, what do we do about those nested element attributes?

那么，我们该如何处理那些嵌套元素属性？

function tokens($code) {
    // ...

    while ($cursor < $length) {
        // ...

        if ($elementStarted !== null && $elementEnded !== null) {
            // ...

            foreach ($attributes as $key => $value) {
                $attributes[$key] = tokens($value);
            }

            if (count($attributes)) {
                $token["attributes"] = $attributes;
            }

            // ...
        }

        $cursor++;
    }

    $tokens[] = trim($code);

    return $tokens;
}

// ...

// Array
// (
//     [0] => <?php
//
//     $classNames = "foo bar";
//     $message = "hello world";
//
//     $thing = (
//     [1] => Array
//         (
//             [tag] => <div className={0} nested={1}>
//             [started] => 157
//             [attributes] => Array
//                 (
//                     [0] => Array
//                         (
//                             [0] => () => { return "outer-div"; }
//                         )
//
//                     [1] => Array
//                         (
//                             [1] => Array
//                                 (
//                                     [tag] => <span className={0}>
//                                     [started] => 19
//                                     [attributes] => Array
//                                         (
//                                             [0] => Array
//                                                 (
//                                                     [0] => "nested-span"
//                                                 )
//
//                                         )
//
//                                 )
//
//                            [2] => with text
//                            [3] => Array
//                                (
//                                    [tag] => </span>
//                                    [started] => 34
//                                )
//                         )
//
//                 )
//
//         )
//
// ...

This is from tokens-5.php (modified)

这来自tokens-5.php (已修改)

Before we associate the attributes, we loop through them and tokenize their values with a recursive function call. We also need to append any remaining text (not inside an attribute or element tag) to the tokens array or it will be ignored.

在关联属性之前，我们先遍历它们，然后使用递归函数调用将其值标记化。我们还需要将所有剩余的文本(不在属性或元素标签内)附加到令牌数组，否则它将被忽略。

The result is a list of tokens which can have nested lists of tokens. It’s almost an AST already.

结果是令牌列表，其中可以包含嵌套的令牌列表。它已经差不多是AST了。

组织令牌 (Organizing Tokens)

Let’s transform this list of tokens into something more like an AST. The first step is to exclude closing tags that match opening tags. We need to identify which tokens are tags:

让我们将此标记列表转换为更像AST的内容。第一步是排除与开始标记匹配的结束标记。我们需要确定哪些标记是标签：

function nodes($tokens) {
    $cursor = 0;
    $length = count($tokens);

    while ($cursor < $length) {
        $token = $tokens[$cursor];

        if (is_array($token)) {
            print $token["tag"] . PHP_EOL;
        }

        $cursor++;
    }
}

$tokens = [
    0 => '<?php

    $classNames = "foo bar";
    $message = "hello world";

    $thing = (',
    1 => [
        'tag' => '<div className={0} nested={1}>',
        'started' => 157,
        'attributes' => [
            0 => [
                0 => '() => { return "outer-div"; }',
            ],
            1 => [
                1 => [
                    'tag' => '<span className={0}>',
                    'started' => 19,
                    'attributes' => [
                        0 => [
                            0 => '"nested-span"',
                        ],
                    ],
                ],
                2 => 'with text</span>',
            ],
        ],
    ],
    2 => 'a bit of text before',
    3 => [
        'tag' => '<span>',
        'started' => 195,
    ],
    4 => '{$message} with a bit of extra text',
    5 => [
        'tag' => '</span>',
        'started' => 249,
    ],
    6 => 'a bit of text after',
    7 => [
        'tag' => '</div>',
        'started' => 282,
    ],
    8 => ');',
];

nodes($tokens);

// <div className={0} nested={1}>
// <span>
// </span>
// </div>

This is from nodes-1.php

这是来自nodes-1.php

I’ve extracted a list of tokens from the last token script, so that I don’t have to run and debug that function anymore. Inside a loop, similar to the one we used during tokenization, we print just the non-attribute element tags. Let’s figure out if they’re opening or closing tags, and also whether the closing tags match the opening ones:

我从上一个令牌脚本中提取了一个令牌列表，这样就不必再运行和调试该功能了。在一个循环中，类似于在标记化过程中使用的循环，我们仅打印非属性元素标签。让我们找出它们是打开还是关闭标签，以及关闭标签是否与打开标签匹配：

function nodes($tokens) {
    $cursor = 0;
    $length = count($tokens);

    while ($cursor < $length) {
        $token = $tokens[$cursor];

        if (is_array($token) && $token["tag"][1] !== "/") {
            preg_match("#^<([a-zA-Z]+)#", $token["tag"], $matches);

            print "OPENING {$matches[1]}" . PHP_EOL;
        }

        if (is_array($token) && $token["tag"][1] === "/") {
            preg_match("#^</([a-zA-Z]+)#", $token["tag"], $matches);

            print "CLOSING {$matches[1]}" . PHP_EOL;
        }

        $cursor++;
    }

    return $tokens;
}

// ...

// OPENING div
// OPENING span
// CLOSING span
// CLOSING div

This is from nodes-1.php (modified)

这是来自nodes-1.php (已修改)

Now that we know which tags are opening tags and which ones are closing ones; we can use reference variables to construct a tree:

现在我们知道哪些标签是开始标签，哪些标签是结束标签；我们可以使用参考变量来构造一棵树：

This is from nodes-2.php

这是来自nodes-2.php

Take some time to study what’s going on here. We create a $nodes array, in which to store the new, organized node structures. We also have a $current variable, to which we assign each opening tag node by reference. This way, we can step down into each element (opening tag, closing tag, and the tokens in between); as well as stepping back up when we encounter a closing tag.

花一些时间来研究这里发生的事情。我们创建一个$nodes数组，在其中存储新的，有组织的节点结构。我们还有一个$current变量，我们通过引用将每个开头标签节点分配给该变量。这样，我们就可以进入每个元素(开始标签，结束标签以及之间的标记)；以及在遇到结束标记时退后一步。

The references are the most tricky part about this, but they’re essential to keeping the code relatively simple. I mean, it’s not that simple; but it is much simpler than a non-reference version.

引用是与此有关的最棘手的部分，但是对于保持代码相对简单而言，它们是必不可少的。我的意思是，这不是那么简单。但是它比非参考版本要简单得多。

We don’t have the cleanest function in terms of how it works recursively. So, when we pass the attributes through the nodes function, we sometimes get empty “token” attributes alongside nested tag attributes. Because of this, we need to filter the attributes to first try and return a nested tag before returning a non-empty token attribute value. This could be cleaned up quite a bit…

就其递归工作方式而言，我们没有最干净的功能。因此，当我们通过nodes函数传递属性时，有时会在嵌套标签属性旁边获得空的“令牌”属性。因此，我们需要过滤属性以首先尝试返回嵌套标记，然后再返回非空标记属性值。这可以清理很多...

改写代码 (Rewriting Code)

Now that the code is neatly arranged in a hierarchy or AST, we can rewrite it into valid PHP code. Let’s begin by writing just the string tokens (which aren’t nested inside elements), and formatting the resulting code:

现在，代码已按层次结构或AST整齐地排列，我们可以将其重写为有效PHP代码。让我们从只编写字符串标记(不嵌套在元素中)开始，然后格式化结果代码：

function parse($nodes) {
    $code = "";

    foreach ($nodes as $node) {
        if (isset($node["token"])) {
            $code .= $node["token"] . PHP_EOL;
        }
    }

    return $code;
}

$nodes = [
    0 => [
        'token' => '<?php

        $classNames = "foo bar";
        $message = "hello world";

        $thing = (',
    ],
    1 => [
        'tag' => '<div className={0} nested={1}>',
        'started' => 157,
        'attributes' => [
            0 => [
                'token' => '() => { return "outer-div"; }',
            ],
            1 => [
                'tag' => '<span className={0}>',
                'started' => 19,
                'attributes' => [
                    0 => [
                        'token' => '"nested-span"',
                    ],
                ],
                'name' => 'span',
                'children' => [
                    0 => [
                        'token' => 'with text',
                    ],
                ],
            ],
        ],
        'name' => 'div',
        'children' => [
            0 => [
                'token' => 'a bit of text before',
            ],
            1 => [
                'tag' => '<span>',
                'started' => 195,
                'name' => 'span',
                'children' => [
                    0 => [
                        'token' => '{$message} with a bit of extra text',
                    ],
                ],
            ],
            2 => [
                'token' => 'a bit of text after',
            ],
        ],
    ],
    2 => [
        'token' => ');',
    ],
];

parse($nodes);

// <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// );

This is from parser-1.php

这是来自parser-1.php

I’ve copied the nodes extracted from the previous script, so we don’t have to debug or reuse that function again. Let’s deal with the elements as well:

我已经复制了从先前脚本中提取的节点，因此我们不必再次调试或重复使用该功能。让我们也处理这些元素：

require __DIR__ . "/vendor/autoload.php";

function parse($nodes) {
    $code = "";

    foreach ($nodes as $node) {
        if (isset($node["token"])) {
            $code .= $node["token"] . PHP_EOL;
        }

        if (isset($node["tag"])) {
            $props = [];
            $attributes = [];
            $elements = [];

            if (isset($node["attributes"])) {
                foreach ($node["attributes"] as $key => $value) {
                    if (isset($value["token"])) {
                        $attributes["attr_{$key}"] = $value["token"];
                    }

                    if (isset($value["tag"])) {
                        $elements[$key] = true;
                        $attributes["attr_{$key}"] = parse([$value]);
                    }
                }
            }

            preg_match_all("#([a-zA-Z]+)={([^}]+)}#", $node["tag"], $dynamic);
            preg_match_all("#([a-zA-Z]+)=[']([^']+)[']#", $node["tag"], $static);

            if (count($dynamic[0])) {
                foreach($dynamic[1] as $key => $value) {
                    $props["{$value}"] = $attributes["attr_{$key}"];
                }
            }

            if (count($static[1])) {
                foreach($static[1] as $key => $value) {
                    $props["{$value}"] = $static[2][$key];
                }
            }

            $code .= "pre_" . $node["name"] . "([" . PHP_EOL;

            foreach ($props as $key => $value) {
                $code .= "'{$key}' => {$value}," . PHP_EOL;
            }

            $code .= "])" . PHP_EOL;
        }
    }

    $code = Pre\Plugin\expand($code);
    $code = Pre\Plugin\formatCode($code);

    return $code;
}

// ...

// <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
//     pre_div([
//         'className' => function () {
//             return "outer-div";
//         },
//         'nested' => pre_span([
//             'className' => "nested-span",
//         ]),
//     ])
// );

This is from parser-2.php

这是来自parser-2.php

When we find a tag node, we loop through the attributes and build a new attributes array that is either just text from token nodes or parsed tags from tag nodes. This bit of recursion deals with the possibility of attributes that are nested elements. Our regular expression only handles attributes quoted with single quotes (for the sake of simplicity). Feel free to make a more comprehensive expression, to handle more complex attribute syntax and values.

找到标签节点后，我们将遍历属性并构建一个新的属性数组，该数组既可以是令牌节点中的文本，也可以是标签节点中已解析的标签。递归处理了嵌套元素属性的可能性。我们的正则表达式仅处理用单引号引起来的属性(为简单起见)。随意进行更全面的表达，以处理更复杂的属性语法和值。

I went ahead and installed pre/short-closures, so that the arrow function would be expanded to a regular function:

我继续安装pre/short-closures ，以便将箭头功能扩展为常规功能：

composer require pre/short-closures

There’s also a handle PSR-2 formatting function in there, so our code is formatted according to the standard.

那里还有一个句柄PSR-2格式化功能，因此我们的代码是根据标准格式化的。

Finally, we need to deal with children:

最后，我们需要处理儿童：

require __DIR__ . "/vendor/autoload.php";

function parse($nodes) {
    $code = "";

    foreach ($nodes as $node) {
        if (isset($node["token"])) {
            $code .= $node["token"] . PHP_EOL;
        }

        if (isset($node["tag"])) {
            // ...

            $children = [];

            foreach ($node["children"] as $child) {
                if (isset($child["tag"])) {
                    $children[] = parse([$child]);
                }

                else {
                    $children[] = "\"" . addslashes($child["token"]) . "\"";
                }
            }

            $props["children"] = $children;

            $code .= "pre_" . $node["name"] . "([" . PHP_EOL;

            foreach ($props as $key => $value) {
                if ($key === "children") {
                    $code .= "\"children\" => [" . PHP_EOL;

                    foreach ($children as $child) {
                        $code .= "{$child}," . PHP_EOL;
                    }

                    $code .= "]," . PHP_EOL;
                }

                else {
                    $code .= "\"{$key}\" => {$value}," . PHP_EOL;
                }
            }

            $code .= "])" . PHP_EOL;
        }
    }

    $code = Pre\Plugin\expand($code);
    $code = Pre\Plugin\formatCode($code);

    return $code;
}

// ...

// <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
//     pre_div([
//         "className" => function () {
//             return "outer-div";
//         },
//         "nested" => pre_span([
//             "className" => "nested-span",
//             "children" => [
//                 "with text",
//             ],
//         ]),
//         "children" => [
//             "a bit of text before",
//             pre_span([
//                 "children" => [
//                     "{$message} with a bit of extra text",
//                 ],
//             ]),
//             "a bit of text after",
//         ],
//     ])
// );

This is from parser-3.php

这是来自parser-3.php

We parse each tag child, and directly quote each token child (adding slashes to account for nested quotes). Then, when we’re building the parameter array; we loop over the children and add each to the string of code our parse function ultimately returns.

我们解析每个标记子代，并直接为每个标记子代加引号(添加斜杠以说明嵌套的引号)。然后，当我们构建参数数组时；我们遍历子级并将每个子级添加到我们的parse函数最终返回的代码字符串中。

Each tag is converted to an equivalent pre_div or pre_span function. This is a placeholder mechanism for a larger, underlying primitive element system. We can demonstrate this by stubbing those functions:

每个标签都转换为等效的pre_div或pre_span函数。这是用于更大的基础原始元素系统的占位符机制。我们可以通过对这些函数进行存根来证明这一点：

require __DIR__ . "/vendor/autoload.php";

function pre_div($props) {
    $code = "<div";

    if (isset($props["className"])) {
        if (is_callable($props["className"])) {
            $class = $props["className"]();
        }

        else {
            $class = $props["className"];
        }

        $code .= " class='{$class}'";
    }

    $code .= ">";

    foreach ($props["children"] as $child) {
        $code .= $child;
    }

    $code .= "</div>";

    return trim($code);
}

function pre_span($props) {
    $code = pre_div($props);
    $code = preg_replace("#^<div#", "<span", $code);
    $code = preg_replace("#div>$#", "span>", $code);

    return $code;
}

function parse($nodes) {
    // ...
}

$nodes = [
    0 => [
        'token' => '<?php

        $classNames = "foo bar";
        $message = "hello world";

        $thing = (',
    ],
    1 => [
        'tag' => '<div className={0} nested={1}>',
        'started' => 157,
        'attributes' => [
            0 => [
                'token' => '() => { return $classNames; }',
            ],
            1 => [
                'tag' => '<span className={0}>',
                'started' => 19,
                'attributes' => [
                    0 => [
                        'token' => '"nested-span"',
                    ],
                ],
                'name' => 'span',
                'children' => [
                    0 => [
                        'token' => 'with text',
                    ],
                ],
            ],
        ],
        'name' => 'div',
        'children' => [
            0 => [
                'token' => 'a bit of text before',
            ],
            1 => [
                'tag' => '<span>',
                'started' => 195,
                'name' => 'span',
                'children' => [
                    0 => [
                        'token' => '{$message} with a bit of extra text',
                    ],
                ],
            ],
            2 => [
                'token' => 'a bit of text after',
            ],
        ],
    ],
    2 => [
        'token' => ');',
    ],
    3 => [
        'token' => 'print $thing;',
    ],
];

eval(substr(parse($nodes), 5));

// <div class='foo bar'>
//     a bit of text before
//     <span>
//         hello world with a bit of extra text
//     </span>
//     a bit of text after
// </div>

This is from parser-4.php

这是来自parser-4.php

I’ve modified the input nodes, so that $thing will be printed. If we implement a naive version of pre_div and pre_span then this code executes successfully. It’s actually hard to believe, given how little code we’ve actually written…

我已经修改了输入节点，以便将$thing打印出来。如果我们实现了pre_div和pre_span的pre_div版本，则此代码将成功执行。鉴于我们实际编写的代码很少，实际上很难相信……

与Pre集成 (Integrating with Pre)

The question is: what do we with with this?

问题是：我们如何处理？

It’s an interesting experiment, but it’s not very usable. What would be better is to have a way to drop this into an existing project, and experiment with component-based design in the real world. To this end, I extended Pre to allow for custom compilers (along with the custom macro definitions it already allows).

这是一个有趣的实验，但不是很有用。最好的办法是将其放入现有项目，并在现实世界中试验基于组件的设计。为此，我扩展了Pre以允许自定义编译器(以及它已经允许的自定义宏定义)。

Then, I packaged the tokens, nodes, and parse functions into a re-usable library. It took quite a while to do this and, between the time I first created the functions and built an example application using them, I improved them quite a bit. Some improvements were small (like creating a set of HTML component primitives), and some were big (like refactoring expressions and allowing custom component classes).

然后，我将tokens ， nodes和parse函数打包到一个可重用的库中。为此花费了一段时间，在我第一次创建函数并使用它们构建示例应用程序之间，我进行了相当多的改进。有些改进很小(如创建一组HTML组件基元)，而有些改进很大(如重构表达式并允许自定义组件类)。

I’m not going to go over all these changes, but I’d like to show you what that example application looks like. It begins with a server script:

我将不介绍所有这些更改，但我想向您展示该示例应用程序的外观。它以服务器脚本开头：

use Silex\Application;
use Silex\Provider\SessionServiceProvider;
use Symfony\Component\HttpFoundation\Request;

use App\Component\AddTask;
use App\Component\Page;
use App\Component\TaskList;

$app = new Application();
$app->register(new SessionServiceProvider());

$app->get("/", (Request $request) => {
    $session = $request->getSession();

    $tasks = $session->get("tasks", []);

    return (
        <Page>
            <TaskList>{$tasks}</TaskList>
            <AddTask></AddTask>
        </Page>
    );
});

$app->post("/add", (Request $request) => {
    $session = $request->getSession();

    $id = $session->get("id", 0);
    $tasks = $session->get("tasks", []);

    $tasks[] = [
        "id" => $id++,
        "text" => $request->get("text"),
    ];

    $session->set("id", $id);
    $session->set("tasks", $tasks);

    return $app->redirect("/");
});

$app->get("/remove/{id}", (Request $request, $id) => {
    $session = $request->getSession();

    $tasks = $session->get("tasks", []);

    $tasks = array_filter($tasks, ($task) => {
        return $task["id"] !== (int) $id;
    });

    $session->set("tasks", $tasks);

    return $app->redirect("/");
});

$app->run();

This is from server.pre

这是从server.pre

The application is built on top of Silex, which is a neat micro-framework. In order to load this server script, I have an index file:

该应用程序建立在Silex的基础上， Silex是一个精巧的微框架。为了加载此服务器脚本，我有一个索引文件：

require __DIR__ . "/../vendor/autoload.php";

Pre\Plugin\process(__DIR__ . "/../server.pre");

This is from public/index.php

这是来自public/index.php

…And I serve this with:

…我为此服务：

php -S localhost:8080 -t public public/index.php

I haven’t yet tried running this through a web server, like Apache or Nginx. I believe it would run in much the same way.

我还没有尝试通过Web服务器(例如Apache或Nginx)运行它。我相信它将以几乎相同的方式运行。

The server scripts begins with me setting up the Silex server. I define a few routes, the first of which fetches an array of tasks from the current session. If that array hasn’t been defined, I default it to an empty array.

服务器脚本始于我设置Silex服务器。我定义了一些路由，第一个路由从当前会话中获取一系列任务。如果尚未定义该数组，则默认为空数组。

I pass these directly, as children of the TaskList component. I’ve wrapped this, and the AddTask component, inside a Page component. The Page component looks like this:

我直接将这些作为TaskList组件的子代传递。我已经将它和AddTask组件包装在Page组件内。 Page组件如下所示：

namespace App\Component;

use InvalidArgumentException;

class Page
{
    public function render($props)
    {
        assert($this->hasValid($props));

        { $children } = $props;

        return (
            "<!doctype html>".
            <html lang="en">
                <body>
                    {$children}
                </body>
            </html>
        );
    }

    private function hasValid($props)
    {
        if (empty($props["children"])) {
            throw new InvalidArgumentException("page needs content (children)");
        }

        return true;
    }
}

This is from app/Component/Page.pre

这是来自app/Component/Page.pre

This component isn’t strictly necessary, but I want to declare the doctype and make space for future header things (like stylesheets and meta tags). I destructure the $props associative array (using some pre/collections syntax) and pass this into the <body> element.

这个组件不是严格必需的，但是我想声明doctype并为将来的标头(例如样式表和meta标签)留出空间。我解构了$props关联数组(使用一些pre/collections语法)，并将其传递给<body>元素。

Then there’s the TaskList component:

然后是TaskList组件：

namespace App\Component;

class TaskList
{
    public function render($props)
    {
        { $children } = $props;

        return (
            <ul className={"task-list"}>
                {$this->children($children)}
            </ul>
        );
    }

    private function children($children)
    {
        if (count($children)) {
            return {$children}->map(($task) => {
                return (
                    <Task id={$task["id"]}>{$task["text"]}</Task>
                );
            });
        }

        return (
            <span>No tasks</span>
        );
    }
}

This is from app/Component/TaskList.pre

这是来自app/Component/TaskList.pre

Elements can have dynamic attributes. In fact, this library doesn’t support them having literal (quoted) attribute values. They’re complicated to support, in addition to these dynamic attribute values. I’m defining the className attribute; which supports a few different formats:

元素可以具有动态属性。实际上，该库不支持它们具有文字(带引号)属性值。除了这些动态属性值之外，它们的支持也很复杂。我正在定义className属性；它支持几种不同的格式：

A literal value expression, like "task-list"
文字值表达式，例如"task-list"
An array (or key-less pre/collection object), like ["first", "second"]
数组(或无键的pre/collection对象)，例如["first", "second"]
An associative array (or keyed pre/collection object), like ["first" => true, "second" => false]
关联数组(或带键的pre/collection对象)，例如["first" => true, "second" => false]

This is similar to the className attribute in ReactJS. The keyed or object form uses the truthiness of values to determine whether the keys are appended to the element’s class attribute.

这类似于ReactJS中的className属性。键或对象形式使用值的真实性来确定键是否附加到元素的class属性。

All the default elements support non-deprecated and non-experimental attributes defined in the Mozilla Developer Network documentation. All elements support an associative array for their style attribute, which uses the kebab-case form of CSS style keys.

所有默认元素都支持Mozilla开发人员网络文档中定义的不推荐使用和非实验属性。所有元素的style属性都支持关联数组，该数组使用kebab-case形式CSS样式键。

Finally, all elements support data- and aria- attributes, and all attribute values may be functions which return their true values (as a form of lazy loading).

最后，所有元素都支持data-和aria-属性，并且所有属性值都可以是返回其真实值(作为延迟加载的形式)的函数。

Let’s look at the Task component:

让我们看一下Task组件：

namespace App\Component;

use InvalidArgumentException;

class Task
{
    public function render($props)
    {
        assert($this->hasValid($props));

        { $children, $id } = $props;

        return (
            <li className={"task"}>
                {$children}
                <a href={"/remove/{$id}"}>remove</a>
            </li>
        );
    }

    private function hasValid($props)
    {
        if (!isset($props["id"])) {
            throw new InvalidArgumentException("task needs id (attribute)");
        }

        if (empty($props["children"])) {
            throw new InvalidArgumentException("task needs text (children)");
        }

        return true;
    }
}

This is from app/Component/Task.pre

这是来自app/Component/Task.pre

Each task expects an id defined for each task (which server.pre defines), and some children. The children are used for the textual representation of a task, and are defined where the tasks are created, in the TaskList component.

每个任务都需要为每个任务(由server.pre定义)定义一个id ，以及一些子代。子级用于任务的文本表示，并在TaskList组件中定义了创建任务的位置。

Finally, let’s look at the AddTask component:

最后，让我们看一下AddTask组件：

namespace App\Component;

class AddTask
{
    public function render($props)
    {
        return (
            <form method={"post"} action={"/add"} className={"add-task"}>
                <input name={"text"} type={"text"} />
                <button type={"submit"}>add</button>
            </form>
        );
    }
}

This is from app/Component/AddTask.pre

这是来自app/Component/AddTask.pre

This component demonstrates a self-closing input component, and little else. Of course, the add and remove functionality needs to be defined (in the server script):

该组件演示了一个自动关闭的input组件，仅此而已。当然，需要定义添加和删除功能(在服务器脚本中)：

$app->post("/add", (Request $request) => {
    $session = $request->getSession();

    $id = $session->get("id", 1);
    $tasks = $session->get("tasks", []);

    $tasks[] = [
        "id" => $id++,
        "text" => $request->get("text"),
    ];

    $session->set("id", $id);
    $session->set("tasks", $tasks);

    return $app->redirect("/");
});

$app->get("/remove/{id}", (Request $request, $id) => {
    $session = $request->getSession();

    $tasks = $session->get("tasks", []);

    $tasks = array_filter($tasks, ($task) => {
        return $task["id"] !== (int) $id;
    });

    $session->set("tasks", $tasks);

    return $app->redirect("/");
});

This is from server.pre

这是从server.pre

We’re not storing anything in a database, but we could. These components and scripts are all that there is to the example application. It’s not a huge example, but it does demonstrate various important things, like component nesting and iterative component rendering.

我们没有在数据库中存储任何内容，但是可以。这些组件和脚本是示例应用程序所需的全部。这不是一个很大的例子，但是它确实展示了各种重要的东西，例如组件嵌套和迭代组件渲染。

It’s also a good example of how some of the different Pre macros work well together; particularly short closures, collections, and in certain cases async/await.

这也是一些不同的Pre宏如何协同工作的好例子。特别是短暂的关闭，收集，在某些情况下是异步/等待。

Here’s a gif of it in action.

这是它的实际效果。

hack (Phack)

While I was working on this project, I rediscovered a project called Phack, by Sara Golemon. It’s a similar sort of project to Pre, which seeks to transpile a PHP superset language (in this case, Hack) into regular PHP.

当我从事这个项目时，我重新发现了Sara Golemon的一个名为Phack的项目。这与Pre相似，该项目试图将PHP超集语言(在本例中为Hack)转换为常规PHP。

The readme lists the Hack features that Phack aims to support, and their status. One of those features is XHP. If you’ve always wanted to write Hack code, but still use standard PHP tools; I recommend checking it out. I’m a huge fan of Sara and her work, so I’ll definitely be keeping an eye on Phack.

自述文件列出了Phack旨在支持的Hack功能及其状态。这些功能之一就是XHP。如果您一直想编写Hack代码，但仍然使用标准PHP工具；我建议检查一下。我是Sara和她的作品的忠实粉丝，因此我一定会密切关注Phack。

摘要 (Summary)

This has been a whirlwind tour of simple compiler creation. We learned how to build a basic state-machine compiler, and how to get it to support HTML-like syntax inside regular PHP syntax. We also looked at how that might work in an example application.

这是简单编译器创建的旋风之旅。我们学习了如何构建基本的状态机编译器，以及如何使其在常规PHP语法中支持类似HTML的语法。我们还研究了在示例应用程序中该如何工作。

I’d like to encourage you to try this out. Perhaps you’d like to add your own syntax to PHP – which you could do with Pre. Perhaps you’d like to change PHP radically. I hope this tutorial has demonstrated one way to do that, well enough that you feel up to the challenge. Remember: creating compilers doesn’t take a huge amount of knowledge or training. Just simple string manipulation, and some trial and error.

我想鼓励您尝试一下。也许您想向PHP添加自己的语法- 您可以使用Pre来完成。也许您想彻底改变PHP。我希望本教程演示了一种实现此目标的方法，足以使您适应挑战。请记住：创建编译器不需要大量的知识或培训。只是简单的字符串操作，以及一些反复试验。

Let us know what you come up with!

让我们知道您的想法！