蛇爬树问题_如何爬树

最新推荐文章于 2023-10-24 18:47:26 发布

cullen2012

最新推荐文章于 2023-10-24 18:47:26 发布

阅读量1.9k

点赞数

文章标签： python java 编程语言 c++ 人工智能

原文链接：https://habr.com/en/company/pvs-studio/blog/502516/

版权

蛇爬树问题

Rather, how to get down from it. But first things first. This article stands out a bit of the usual format of articles from PVS-Studio. We often write about checking other projects, but almost never lift the veil on our inner workings. It's time to rectify this omission and talk about how the analyzer is built from the inside. More precisely, about the most important of its parts — the syntax tree. The article will focus on the part of PVS-Studio that relates to the C and C++ languages.

相反，如何摆脱困境。但是首先是第一件事。本文突出了PVS-Studio文章的常用格式。我们经常写关于检查其他项目的文章，但几乎从来没有揭开我们内部工作的面纱。现在该纠正这一遗漏，并从内部讨论分析仪的构造。更确切地说，关于其最重要的部分-语法树。本文将重点介绍与C和C ++语言有关的PVS-Studio部分。

第一件事 (First things first)

The syntax tree is the central part of any compiler. One way or another, the code needs to be presented in a form convenient for program handling, and it just so happens that the tree structure is best suited for this. I will not delve into the theory here — suffice it to say that the tree very well reflects the hierarchy of expressions and blocks in the code, and at the same time contains only the data necessary for work.

语法树是任何编译器的核心部分。一种或另一种方式是，需要以一种便于程序处理的形式来呈现代码，而恰好恰恰是树结构最适合此形式。我不会在这里深入研究该理论-足以说这棵树很好地反映了代码中表达式和块的层次结构，并且同时仅包含工作所需的数据。

What does the compiler have to do with the static analyzer? The fact is that these two tools have a lot in common. At the initial stage of parsing the code, they do the same job. First, the code is divided into a stream of tokens, which is fed to the parser. Then, in the process of synthetic and semantic analysis, tokens are organized into a tree, which is sent further along the pipeline. At this stage, compilers can perform intermediate optimizations before generating binary code, static analyzers begin traversing nodes and launching various checks.

编译器与静态分析器有什么关系？事实是这两个工具有很多共同点。在解析代码的初始阶段，它们执行相同的工作。首先，将代码分为令牌流，该令牌流被馈送到解析器。然后，在综合和语义分析过程中，令牌被组织成一棵树，然后沿着管道进一步发送。在此阶段，编译器可以在生成二进制代码之前执行中间优化，静态分析器开始遍历节点并启动各种检查。

In the PVS-Studio analyzer with a tree built, several things happen:

在带有树的PVS-Studio分析仪中，发生了几件事：

For each declaration, types are determined. A declaration can be a variable, function, class, type alias definition via using or typedef, and so on. In brief, any declaration. All this is entered up in the table for the current scope;
对于每个声明，确定类型。声明可以是变量，函数，类， 使用using或typedef的类型别名定义，等等。简而言之，任何声明。所有这些都输入到当前范围的表中；
Expressions are processed and variable values are calculated. Information that the analyzer uses for symbolic calculations and data flow analysis is stored;
处理表达式并计算变量值。存储分析仪用于符号计算和数据流分析的信息；
Overloads of the called functions are selected, predefined annotations are applied to them, and if they are absent, then whenever possible they are deduced automatically;
选择被调用函数的重载，对它们应用预定义的注释，如果不存在，则在可能的情况下自动推断出它们；
The data flow is analyzed. To do this, the analyzer stores the value of each variable (if it can be calculated at compile time). In addition to the values, known data about their state is attached to the variables. For example, let's say that a function starts with a check of a pointer for nullptr followed by exiting the function if the pointer is null. In this case it will be considered valid further along the code. This data is also used in interprocedural analysis;
分析数据流。为此，分析器将存储每个变量的值(如果可以在编译时进行计算)。除了这些值之外，有关其状态的已知数据还附加到变量中。例如，假设一个函数从检查指针是否为nullptr开始，如果指针为null，则退出该函数。在这种情况下，它将在代码中进一步有效。该数据还用于过程间分析；
Diagnostic rules are run. Depending on the logic of their work, they can do an additional traversal of the tree. For different types of expressions, their own sets of diagnostics are launched, which sometimes may intersect.
运行诊断规则。根据工作的逻辑，他们可以对树进行其他遍历。对于不同类型的表达式，将启动它们自己的诊断程序集，有时可能会相交。

If you are interested in the details of how the analysis works, I recommend reading the article "Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities". Some points from the list are covered there in detail.

如果您对分析的工作方式的细节感兴趣，我建议您阅读文章“ PVS-Studio代码分析器中用于查找错误和潜在漏洞的技术 ”。列表中的一些要点在此处详细介绍。

We will look in more detail what happens to the tree inside the analyzer, and how it looks in general. At this point a brief introduction is over, it's time to get to the crux of the matter.

我们将更详细地分析分析器内部的树会发生什么，以及它的总体外观。至此，简短的介绍已经结束，现在是解决问题的关键了。

这个怎么运作 (How it works)

Historically, PVS-Studio uses a binary tree to represent code. This classic data structure is familiar to everyone — we have a node that generally refers to two child ones. I will call nodes that are not supposed to have descendants — terminals, all others — nonterminals. A nonterminal may in some cases not have child nodes, but its key difference from the terminal is that descendants are fundamentally allowed for it. Terminal nodes (or leaves) lack the ability to refer to something other than the parent.

从历史上看，PVS-Studio使用二叉树表示代码。每个人都熟悉这种经典的数据结构-我们有一个通常引用两个子节点的节点。我将不应该具有后代的节点称为终端，而所有其他节点都称为非终端。非终端在某些情况下可能没有子节点，但与终端的主要区别在于从根本上允许后代。终端节点(或叶子)缺乏引用父节点以外的东西的能力。

The structure used in PVS-Studio is slightly different from the classical binary tree — this is necessary for convenience. Terminal nodes usually correspond to keywords, variable names, literals, and so on. Non-terminals — various types of expressions, blocks of code, lists, and alike constituent elements of a tree.

PVS-Studio中使用的结构与经典的二叉树略有不同-为方便起见，这是必需的。终端节点通常对应于关键字，变量名，文字等。非终结符-各种类型的表达式，代码块，列表以及树的类似组成元素。

With regard to compilers design, everything here is pretty standard. I encourage all interested to check out the iconic "Dragon Book".

关于编译器设计，这里的所有内容都是非常标准的。我鼓励所有感兴趣的人阅读标志性的“ 龙书 ”。

As for us, we move on. Let's look at a simple code example and how the analyzer perceives it. Further there will be many pictures from our internal tree visualization utility.

至于我们，我们继续前进。让我们看一个简单的代码示例，以及分析器如何看待它。此外，我们的内部树可视化实用程序将提供许多图片。

So here is the example:

因此，这是示例：

int f(int a, int b)
{
  return a + b;
}

Being handled by the parser this simple function will look like this (non-terminal nodes are highlighted in yellow):

由解析器处理的这个简单函数将如下所示(非终端节点以黄色突出显示)：

Such representation has its pros and cons. Cons, in my opinion, outnumber the pros. Anyway, let's look at the tree itself. I hasten to say that it is rather redundant, for example, as it contains punctuation and parentheses. The compiler considers it as superfluous garbage, but the analyzer might need this information for some diagnostic rules. In other words, the analyzer does not work with the abstract syntax tree (AST), but with the derivation tree (DT).

这样的代表有其优点和缺点。我认为缺点多于优点。无论如何，让我们看一下树本身。我不得不说这是多余的，例如，因为它包含标点符号和括号。编译器认为它是多余的垃圾，但是分析器可能需要此信息来用于某些诊断规则。换句话说，分析器不使用抽象语法树(AST)，而是使用派生树 (DT)。

The tree grows from left to right and from top to bottom. Left child nodes always contain something meaningful, such as declarators. If we look at the right part of it, we'll see intermediate nonterminals marked by the word NonLeaf. They are only needed for the free to retain its structure. Such nodes don't convey any informational load for the analysis needs.

这棵树从左到右，从上到下生长。左子节点始终包含有意义的内容，例如声明符。如果我们看一下它的正确部分，我们会看到中间的非终结符，并标有单词NonLeaf 。仅需要它们即可自由保留其结构。这样的节点不传达任何分析需求的信息负载。

At this point, we're interested in the left part of the tree. Here it is in a larger closeup:

此时，我们对树的左侧感兴趣。这是一个更大的特写：

This is a function declaration. The PtreeDeclarator parent node is an object through which you can access nodes with the name of the function and its parameters. It also stores the encoded signature for the type system. It seems to me that this picture is pretty self-explanatory, and it's pretty easy to compare the elements of the tree with the code.

这是一个函数声明。 PtreeDeclarator父节点是一个对象，您可以通过该对象访问带有函数名称及其参数的节点。它还存储类型系统的编码签名。在我看来，这张图片是不言自明的，并且很容易将树的元素与代码进行比较。

Looks simple, right?

看起来很简单，对吧？

For more clarity, let's take a simpler example. Imagine that we have the code that calls our f function:

为了更清楚，让我们举一个简单的例子。假设我们有调用f函数的代码：

最低0.47元/天解锁文章

cullen2012

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
蛇爬树问题_如何爬树

蛇爬树问题Rather, how to get down from it. But first things first. This article stands out a bit of the usual format of articles from PVS-Studio. We often write about checking other projects, but almost ...
复制链接

扫一扫