g2是一种编程语言吗_我写了一种编程语言。这也是您可以的方式。-CSDN博客

g2是一种编程语言吗

by William W Wold

威廉·沃尔德(William W Wold)

我写了一种编程语言。这也是您可以的方式。 (I wrote a programming language. Here’s how you can, too.)

Over the past 6 months, I’ve been working on a programming language called Pinecone. I wouldn’t call it mature yet, but it already has enough features working to be usable, such as:

在过去的6个月中，我一直在研究一种名为Pinecone的编程语言。我还不称其成熟，但是它已经具有足够的可用功能，例如：

variables
变数
functions
功能
user defined structures
用户定义的结构

If you’re interested in it, check out Pinecone’s landing page or its GitHub repo.

如果您对此感兴趣，请查看Pinecone的登录页面或其GitHub存储库。

I’m not an expert. When I started this project, I had no clue what I was doing, and I still don’t. I’ve taken zero classes on language creation, read only a bit about it online, and did not follow much of the advice I have been given.

我不是专家。当我开始这个项目时，我不知道自己在做什么，但现在仍然不知道。我在语言创建方面上过零课，在网上只读了一点，并且没有听从我给出的很多建议。

And yet, I still made a completely new language. And it works. So I must be doing something right.

但是，我仍然是一种全新的语言。而且有效。所以我必须做正确的事。

In this post, I’ll dive under the hood and show you the pipeline Pinecone (and other programming languages) use to turn source code into magic.

在这篇文章中，我将深入探讨，并向您展示Pinecone(和其他编程语言)用来将源代码变成魔术的管道。

I‘ll also touch on some of the tradeoffs I’ve had make, and why I made the decisions I did.

我还将谈谈我已经做出的一些权衡，以及为什么我做出自己的决定。

This is by no means a complete tutorial on writing a programming language, but it’s a good starting point if you’re curious about language development.

这绝不是编写编程语言的完整教程，但是如果您对语言开发感到好奇，那么这是一个很好的起点。

入门 (Getting Started)

“I have absolutely no idea where I would even start” is something I hear a lot when I tell other developers I’m writing a language. In case that’s your reaction, I’ll now go through some initial decisions that are made and steps that are taken when starting any new language.

当我告诉其他开发人员我正在写语言时，“我绝对不知道从哪里开始”。如果这是您的React，我现在将进行一些初步的决定，并在开始使用任何新语言时采取的步骤。

编译与解释 (Compiled vs Interpreted)

There are two major types of languages: compiled and interpreted:

语言有两种主要类型：编译和解释：

A compiler figures out everything a program will do, turns it into “machine code” (a format the computer can run really fast), then saves that to be executed later.
编译器会弄清楚程序将执行的所有操作，然后将其转换为“机器代码”(计算机可以非常快速地运行的一种格式)，然后保存该代码以便以后执行。
An interpreter steps through the source code line by line, figuring out what it’s doing as it goes.
解释器逐行浏览源代码，弄清楚它在做什么。

Technically any language could be compiled or interpreted, but one or the other usually makes more sense for a specific language. Generally, interpreting tends to be more flexible, while compiling tends to have higher performance. But this is only scratching the surface of a very complex topic.

从技术上讲，任何一种语言都可以编译或解释，但是对于一种特定的语言而言，一种或另一种通常更有意义。通常，解释倾向于更灵活，而编译则倾向于具有更高的性能。但这仅仅是一个非常复杂的话题的表面。

I highly value performance, and I saw a lack of programming languages that are both high performance and simplicity-oriented, so I went with compiled for Pinecone.

我非常重视性能，而且我发现缺乏兼具高性能和面向简单性的编程语言，因此我选择了Pinecone的编译器。

This was an important decision to make early on, because a lot of language design decisions are affected by it (for example, static typing is a big benefit to compiled languages, but not so much for interpreted ones).

这是一个很重要的决定，因为很多语言设计决定都会受到它的影响(例如，静态类型对编译语言有很大的好处，而对于解释型语言则没有太大的作用)。

Despite the fact that Pinecone was designed with compiling in mind, it does have a fully functional interpreter which was the only way to run it for a while. There are a number of reasons for this, which I will explain later on.

尽管Pinecone在设计时就考虑了编译的问题，但它确实具有功能齐全的解释器，这是运行它一段时间的唯一方法。造成这种情况的原因很多，我将在后面解释。

选择语言 (Choosing a Language)

I know it’s a bit meta, but a programming language is itself a program, and thus you need to write it in a language. I chose C++ because of its performance and large feature set. Also, I actually do enjoy working in C++.

我知道它有点元，但是编程语言本身就是程序，因此您需要使用某种语言编写它。我选择C ++是因为它的性能和强大的功能集。另外，我实际上很喜欢用C ++进行工作。

If you are writing an interpreted language, it makes a lot of sense to write it in a compiled one (like C, C++ or Swift) because the performance lost in the language of your interpreter and the interpreter that is interpreting your interpreter will compound.

如果您正在编写一种解释语言，那么用一种编译后的语言(例如C，C ++或Swift)编写它是很有意义的，因为在您的解释器语言和正在解释您的解释器的解释器的语言中，性能的损失将会加重。

If you plan to compile, a slower language (like Python or JavaScript) is more acceptable. Compile time may be bad, but in my opinion that isn’t nearly as big a deal as bad run time.

如果您打算进行编译，则较慢的语言(例如Python或JavaScript)更可被接受。编译时间可能不好，但是我认为这与糟糕的运行时间并没有什么大不了的。

高级设计 (High Level Design)

A programming language is generally structured as a pipeline. That is, it has several stages. Each stage has data formatted in a specific, well defined way. It also has functions to transform data from each stage to the next.

编程语言通常被构造为管道。也就是说，它具有几个阶段。每个阶段都有以明确定义的特定方式格式化的数据。它还具有将数据从每个阶段转换到下一个阶段的功能。

The first stage is a string containing the entire input source file. The final stage is something that can be run. This will all become clear as we go through the Pinecone pipeline step by step.

第一级是包含整个输入源文件的字符串。最后阶段是可以运行的东西。随着我们逐步通过Pinecone管道，这一切将变得很清楚。

乐兴 (Lexing)

The first step in most programming languages is lexing, or tokenizing. ‘Lex’ is short for lexical analysis, a very fancy word for splitting a bunch of text into tokens. The word ‘tokenizer’ makes a lot more sense, but ‘lexer’ is so much fun to say that I use it anyway.

大多数编程语言中的第一步是词法化或标记化。 “ Lex”是词法分析的缩写，是一个非常漂亮的词，用于将一堆文本拆分为标记。 “ tokenizer”一词含义更广，但是“ lexer”是如此有趣，以至于我仍然使用它。

代币 (Tokens)

A token is a small unit of a language. A token might be a variable or function name (AKA an identifier), an operator or a number.

令牌是语言的一小部分。令牌可以是变量或函数名称(又称标识符)，运算符或数字。

Lexer的任务 (Task of the Lexer)

The lexer is supposed to take in a string containing an entire files worth of source code and spit out a list containing every token.

该词法分析器应该接收一个包含整个源代码文件的字符串，并吐出一个包含每个令牌的列表。

Future stages of the pipeline will not refer back to the original source code, so the lexer must produce all the information needed by them. The reason for this relatively strict pipeline format is that the lexer may do tasks such as removing comments or detecting if something is a number or identifier. You want to keep that logic locked inside the lexer, both so you don’t have to think about these rules when writing the rest of the language, and so you can change this type of syntax all in one place.

流水线的后续阶段不会引用原始源代码，因此词法分析器必须提供它们所需的所有信息。之所以采用这种相对严格的管道格式，是因为词法分析器可能执行诸如删除注释或检测某物是数字还是标识符之类的任务。您希望将这种逻辑锁定在词法分析器内部，所以两者都不需要在编写其余语言时就考虑这些规则，因此可以一次更改所有此类语法。

柔性 (Flex)

The day I started the language, the first thing I wrote was a simple lexer. Soon after, I started learning about tools that would supposedly make lexing simpler, and less buggy.

我开始使用该语言的那天，我写的第一件事是一个简单的词法分析器。不久之后，我开始学习可以使词法简化和减少错误的工具。

The predominant such tool is Flex, a program that generates lexers. You give it a file which has a special syntax to describe the language’s grammar. From that it generates a C program which lexes a string and produces the desired output.

这种工具最主要的是Flex，它是生成词法分析器的程序。您给它一个文件，该文件具有描述该语言语法的特殊语法。由此，它生成一个C程序，该程序对字符串进行词法化并生成所需的输出。

我的决定 (My Decision)

I opted to keep the lexer I wrote for the time being. In the end, I didn’t see significant benefits of using Flex, at least not enough to justify adding a dependency and complicating the build process.

我选择暂时保留我写的词法分析器。最后，我没有看到使用Flex的显着好处，至少还不足以证明添加依赖项并使构建过程复杂化。

My lexer is only a few hundred lines long, and rarely gives me any trouble. Rolling my own lexer also gives me more flexibility, such as the ability to add an operator to the language without editing multiple files.

我的词法分析器只有几百行，很少给我带来麻烦。滚动自己的词法分析器还为我提供了更大的灵活性，例如能够在不编辑多个文件的情况下向该语言添加运算符。

解析中 (Parsing)

The second stage of the pipeline is the parser. The parser turns a list of tokens into a tree of nodes. A tree used for storing this type of data is known as an Abstract Syntax Tree, or AST. At least in Pinecone, the AST does not have any info about types or which identifiers are which. It is simply structured tokens.

流水线的第二阶段是解析器。解析器将令牌列表转换为节点树。用于存储此类数据的树称为抽象语法树或AST。至少在Pinecone中，AST没有有关类型或哪个标识符的信息。它只是结构化的令牌。

解析器职责 (Parser Duties)

The parser adds structure to to the ordered list of tokens the lexer produces. To stop ambiguities, the parser must take into account parenthesis and the order of operations. Simply parsing operators isn’t terribly difficult, but as more language constructs get added, parsing can become very complex.

解析器将结构添加到词法分析器生成的令牌的有序列表中。要消除歧义，解析器必须考虑括号和操作顺序。只需简单地解析运算符就不会很困难，但是随着添加更多语言构造，解析会变得非常复杂。

野牛 (Bison)

Again, there was a decision to make involving a third party library. The predominant parsing library is Bison. Bison works a lot like Flex. You write a file in a custom format that stores the grammar information, then Bison uses that to generate a C program that will do your parsing. I did not choose to use Bison.

再次，有一个涉及第三方库的决定。主要的解析库是Bison。野牛像Flex一样工作。您以自定义格式编写了一个文件，该文件存储了语法信息，然后Bison使用该文件生成一个C程序来进行解析。我没有选择使用野牛。

为什么自定义更好 (Why Custom Is Better)

With the lexer, the decision to use my own code was fairly obvious. A lexer is such a trivial program that not writing my own felt almost as silly as not writing my own ‘left-pad’.

使用词法分析器，使用我自己的代码的决定就很明显了。一个词法分析器是如此琐碎的程序，以至于没有写我自己的感觉就像没有写我自己的“左键盘”一样愚蠢。

With the parser, it’s a different matter. My Pinecone parser is currently 750 lines long, and I’ve written three of them because the first two were trash.

使用解析器，这是另一回事。我的Pinecone解析器当前有750行，我已经写了三行，因为前两行是垃圾。

I originally made my decision for a number of reasons, and while it hasn’t gone completely smoothly, most of them hold true. The major ones are as follows:

我最初做出决定的原因有很多，尽管这个决定并没有完全顺利，但大多数都成立。主要内容如下：

Minimize context switching in workflow: context switching between C++ and Pinecone is bad enough without throwing in Bison’s grammar grammar
最大限度地减少工作流中的上下文切换：在C ++和Pinecone之间进行上下文切换足够糟糕，而又不会抛出Bison的语法语法
Keep build simple: every time the grammar changes Bison has to be run before the build. This can be automated but it becomes a pain when switching between build systems.
保持构建简单：每次语法更改时，都必须在构建之前运行Bison。这可以自动化，但是在构建系统之间切换时会很麻烦。
I like building cool shit: I didn’t make Pinecone because I thought it would be easy, so why would I delegate a central role when I could do it myself? A custom parser may not be trivial, but it is completely doable.
我喜欢创建一个很酷的狗屎：我之所以没有派Pinecone是因为我认为这很容易，所以当我自己能做的时候为什么还要委派中心角色呢？自定义解析器可能并不简单，但是完全可行。

In the beginning I wasn’t completely sure if I was going down a viable path, but I was given confidence by what Walter Bright (a developer on an early version of C++, and the creator of the D language) had to say on the topic:

一开始我不确定自己是否走了一条可行的道路，但沃尔特·布赖特(C ++的早期版本的开发人员和D语言的创建者) 在演讲中说了什么，这使我充满信心。主题：

“Somewhat more controversial, I wouldn’t bother wasting time with lexer or parser generators and other so-called “compiler compilers.” They’re a waste of time. Writing a lexer and parser is a tiny percentage of the job of writing a compiler. Using a generator will take up about as much time as writing one by hand, and it will marry you to the generator (which matters when porting the compiler to a new platform). And generators also have the unfortunate reputation of emitting lousy error messages.”

“更具争议性的是，我不会在lexer或解析器生成器以及其他所谓的“编译器编译器”上浪费时间。他们是在浪费时间。编写词法分析器和解析器仅占编写编译器工作的一小部分。使用生成器所花费的时间与手工编写所花费的时间一样多，并且会使您与生成器结婚(在将编译器移植到新平台时很重要)。生成器还具有发出糟糕的错误消息的不幸声誉。”

行动树 (Action Tree)

We have now left the the area of common, universal terms, or at least I don’t know what the terms are anymore. From my understanding, what I call the ‘action tree’ is most akin to LLVM’s IR (intermediate representation).

现在，我们离开了通用，通用术语的领域，或者至少我不知道这些术语的含义了。根据我的理解，我所谓的“动作树”最类似于LLVM的IR(中间表示)。

There is a subtle but very significant difference between the action tree and the abstract syntax tree. It took me quite a while to figure out that there even should be a difference between them (which contributed to the need for rewrites of the parser).

在动作树和抽象语法树之间存在细微但非常重要的区别。我花了很长时间才弄清楚它们之间甚至应该有所区别(这导致需要重写解析器)。

动作树与AST (Action Tree vs AST)

Put simply, the action tree is the AST with context. That context is info such as what type a function returns, or that two places in which a variable is used are in fact using the same variable. Because it needs to figure out and remember all this context, the code that generates the action tree needs lots of namespace lookup tables and other thingamabobs.

简而言之，动作树就是带有上下文的AST。该上下文是诸如函数返回的类型之类的信息，或者实际上是两个使用变量的地方都使用相同的变量。因为它需要弄清楚并记住所有这些上下文，所以生成动作树的代码需要大量的命名空间查找表和其他thingamabob。

运行动作树 (Running the Action Tree)

Once we have the action tree, running the code is easy. Each action node has a function ‘execute’ which takes some input, does whatever the action should (including possibly calling sub action) and returns the action’s output. This is the interpreter in action.

一旦有了动作树，就可以轻松运行代码。每个动作节点都有一个函数“执行”，该函数接受一些输入，执行该动作应该执行的所有操作(包括可能调用子动作)并返回该动作的输出。这是行动中的口译员。

编译选项 (Compiling Options)

“But wait!” I hear you say, “isn’t Pinecone supposed to by compiled?” Yes, it is. But compiling is harder than interpreting. There are a few possible approaches.

“可是等等！” 我听到你说：“ Pinecone不应该被编译吗？” 是的。但是编译比解释更难。有几种可能的方法。

构建自己的编译器 (Build My Own Compiler)

This sounded like a good idea to me at first. I do love making things myself, and I’ve been itching for an excuse to get good at assembly.

起初，这听起来对我来说是个好主意。我喜欢自己动手做东西，而且一直渴望找到一个擅长组装的借口。

Unfortunately, writing a portable compiler is not as easy as writing some machine code for each language element. Because of the number of architectures and operating systems, it is impractical for any individual to write a cross platform compiler backend.

不幸的是，编写可移植的编译器并不像为每种语言元素编写一些机器代码那样容易。由于架构和操作系统的数量众多，因此任何人编写跨平台编译器后端都是不切实际的。

Even the teams behind Swift, Rust and Clang don’t want to bother with it all on their own, so instead they all use…

即使是Swift，Rust和Clang背后的团队也不想独自打扰，因此他们全都使用…

虚拟机 (LLVM)

LLVM is a collection of compiler tools. It’s basically a library that will turn your language into a compiled executable binary. It seemed like the perfect choice, so I jumped right in. Sadly I didn’t check how deep the water was and I immediately drowned.

LLVM是编译器工具的集合。基本上，这是一个库，它将您的语言转换为已编译的可执行二进制文件。看来，这是一个完美的选择，所以我跳了进去。可悲的是，我没有检查水的深度，我立即淹死了。

LLVM, while not assembly language hard, is gigantic complex library hard. It’s not impossible to use, and they have good tutorials, but I realized I would have to get some practice before I was ready to fully implement a Pinecone compiler with it.

LLVM虽然不是汇编语言，但它是巨大的复杂库。它不是不可能使用，并且它们都有不错的教程，但是我意识到，在准备好使用它完全实现Pinecone编译器之前，我必须先进行一些练习。

转堆 (Transpiling)

I wanted some sort of compiled Pinecone and I wanted it fast, so I turned to one method I knew I could make work: transpiling.

我想要某种已编译的Pinecone，但我希望很快，所以我转而使用一种我知道可以进行工作的方法：编译。

I wrote a Pinecone to C++ transpiler, and added the ability to automatically compile the output source with GCC. This currently works for almost all Pinecone programs (though there are a few edge cases that break it). It is not a particularly portable or scalable solution, but it works for the time being.

我为C ++转译器编写了一个Pinecone，并添加了使用GCC自动编译输出源的功能。目前，这几乎适用于几乎所有的Pinecone程序(尽管有一些破例)。它不是一个特别可移植或可伸缩的解决方案，但是暂时可以使用。

未来 (Future)

Assuming I continue to develop Pinecone, It will get LLVM compiling support sooner or later. I suspect no mater how much I work on it, the transpiler will never be completely stable and the benefits of LLVM are numerous. It’s just a matter of when I have time to make some sample projects in LLVM and get the hang of it.

假设我继续开发Pinecone，则迟早会得到LLVM编译支持。我怀疑没有多少我会做的事情，编译器将永远不会完全稳定，LLVM的好处是很多的。这只是我何时有时间在LLVM中进行一些示例项目并掌握其中的问题。

Until then, the interpreter is great for trivial programs and C++ transpiling works for most things that need more performance.

在此之前，解释器非常适合琐碎的程序，而C ++转堆适用于大多数需要更高性能的事情。

结论 (Conclusion)

I hope I’ve made programming languages a little less mysterious for you. If you do want to make one yourself, I highly recommend it. There are a ton of implementation details to figure out but the outline here should be enough to get you going.

我希望我对您的编程语言有所帮助。如果您确实想自己做一个，我强烈建议您这样做。有大量的实现细节需要弄清楚，但是这里的概述应该足以使您前进。

Here is my high level advice for getting started (remember, I don’t really know what I’m doing, so take it with a grain of salt):

这是我入门的高级建议(请记住，我真的不知道我在做什么，因此请适量服用)：

If in doubt, go interpreted. Interpreted languages are generally easier design, build and learn. I’m not discouraging you from writing a compiled one if you know that’s what you want to do, but if you’re on the fence, I would go interpreted.
如有疑问，请解释。口译语言通常更容易设计，构建和学习。如果您知道那是我想做的，我不会阻止您编写一份汇编的书，但是如果您不了解，我会去解释一下。
When it comes to lexers and parsers, do whatever you want. There are valid arguments for and against writing your own. In the end, if you think out your design and implement everything in a sensible way, it doesn’t really matter.
在词法分析器和解析器方面，您可以做任何想做的事情。有支持和反对编写自己的有效论据。最后，如果您想出自己的设计并以明智的方式实施所有事情，那实际上就没有关系。
Learn from the pipeline I ended up with. A lot of trial and error went into designing the pipeline I have now. I have attempted eliminating ASTs, ASTs that turn into actions trees in place, and other terrible ideas. This pipeline works, so don’t change it unless you have a really good idea.
从我最终获得的渠道中学习。在设计我现在拥有的管道时，经历了很多试验和错误。我曾尝试消除AST，将AST转变成适当的动作树以及其他可怕的想法。该管道有效，因此除非您有个好主意，否则不要更改它。
If you don’t have the time or motivation to implement a complex general purpose language, try implementing an esoteric language such as Brainfuck. These interpreters can be as short as a few hundred lines.
如果您没有时间或动力去实现复杂的通用语言，请尝试实现诸如Brainfuck之类的深奥语言。这些口译员可能只有几百行。

I have very few regrets when it comes to Pinecone development. I made a number of bad choices along the way, but I have rewritten most of the code affected by such mistakes.

当谈到Pinecone开发时，我很少后悔。在此过程中，我做出了许多错误的选择，但是我重写了受此类错误影响的大多数代码。

Right now, Pinecone is in a good enough state that it functions well and can be easily improved. Writing Pinecone has been a hugely educational and enjoyable experience for me, and it’s just getting started.

目前，Pinecone处于足够好的状态，可以很好地运行并且可以轻松地进行改进。对我而言，写作Pinecone一直是非常有教育意义和令人愉快的经历，并且它才刚刚开始。