(硬核干货)探索类型系统的底层 - 自己实现一个 TS

最新推荐文章于 2024-11-05 17:23:52 发布

夹谷景曜

最新推荐文章于 2024-11-05 17:23:52 发布

阅读量807

点赞数 15

分类专栏：前端程序员文章标签： ubuntu linux 运维

本文链接：https://blog.csdn.net/2301_77118221/article/details/136765887

版权

前端程序员专栏收录该内容

9 篇文章 0 订阅

订阅专栏

本文详细介绍了构建一个编译器的过程，重点讨论了TypeScript中的类型检查机制，包括解析器处理不同类型匹配问题和未定义类型的场景，以及检查器如何遍历AST并执行相应的类型检查。

摘要由CSDN通过智能技术生成

2. 检查 - Checking

现在类型推断已经完成，类型已经分配，引擎可以运行它的类型检查。他们检查给定代码的 semantics。这些类型的检查有很多种，从类型错误匹配到类型不存在。

对于 TypeScript 来说，这是 Checker (第二个语义传递) ，它有 20000+ 行代码。

我觉得这给出了一个非常强大的 idea，即在如此多的不同场景中检查如此多的不同类型是多么的复杂和困难。

类型检查器不依赖于调用代码，即如果一个文件中的任何代码被执行（例如，在运行时）。类型检查器将处理给定文件中的每一行，并运行适当的检查。

高级类型检查器功能

由于这些概念的复杂性，我们今天不深入探讨以下几个概念：

懒编译 - Lazy compilation

现代编译的一个共同特征是延迟加载。他们不会重新计算或重新编译文件或 AST 分支，除非绝对需要。

TypeScript 预处理程序可以使用缓存在内存中的前一次运行的 AST 代码。这将大大提高性能，因为它只需要关注程序或节点树的一小部分已更改的内容。

TypeScript 使用不可变的只读数据结构，这些数据结构存储在它所称的 look aside tables 中。这样很容易知道什么已经改变，什么没有改变。

稳健性

在编译时，有些操作编译器不确定是安全的，必须等待运行时。每个编译器都必须做出困难的选择，以确定哪些内容将被包含，哪些不会被包含。TypeScript 有一些被称为不健全的区域（即需要运行时类型检查）。

我们不会在编译器中讨论上述特性，因为它们增加了额外的复杂性，对于我们的小 POC 来说不值得。

现在令人兴奋的是，我们自己也要实现一个编译器。

B 部分：构建我们自己的类型系统编译器

===================

我们将构建一个编译器，它可以对三个不同的场景运行类型检查，并为每个场景抛出特定的信息。

我们将其限制在三个场景中的原因是，我们可以关注每一个场景中的具体机制，并希望到最后能够对如何引入更复杂的类型检查有一个更好的构思。

我们将在编译器中使用函数声明和表达式(调用该函数)。

这些场景包括：

1. 字符串与数字的类型匹配问题

fn(“craig-string”); // throw with string vs number

function fn(a: number) {}

2. 使用未定义的未知类型

fn(“craig-string”); // throw with string vs ?

function fn(a: made_up_type) {} // throw with bad type

3. 使用代码中未定义的属性名

interface Person {

}

fn({ nam: “craig” }); // throw with “nam” vs “name”

function fn(a: Person) {}

实现我们的编译器，需要两部分：解析器和检查器。

解析器 - Parser

前面提到，我们今天不会关注解析器。我们将遵循 Hegel 的解析方法，假设一个 typeAnnotation 对象已经附加到所有带注解的 AST 节点中。我已经硬编码了 AST 对象。

场景 1 将使用以下解析器：

字符串与数字的类型匹配问题

function parser(code) {

// fn(“craig-string”);

const expressionAst = {

type: “ExpressionStatement”,

expression: {

type: “CallExpression”,

callee: {

type: “Identifier”,

arguments: [

{

type: “StringLiteral”, // Parser “Inference” for type.

value: “craig-string”

}

]

}

};

// function fn(a: number) {}

const declarationAst = {

type: “FunctionDeclaration”,

id: {

type: “Identifier”,

params: [

{

type: “Identifier”,

// 参数标识

typeAnnotation: {

// our only type annotation

type: “TypeAnnotation”,

typeAnnotation: {

// 数字类型

type: “NumberTypeAnnotation”

}

body: {

type: “BlockStatement”,

body: [] // “body” === block/line of code. Ours is empty

}

};

const programAst = {

type: “File”,

program: {

type: “Program”,

body: [expressionAst, declarationAst]

}

};

// normal AST except with typeAnnotations on

return programAst;

}

可以看到场景 1 中，第一行 fn("craig-string") 语句的 AST 对应 expressionAst，第二行声明函数的 AST 对应 declarationAst。最后返回一个 programmast，它是一个包含两个 AST 块的程序。

在AST中，您可以看到参数标识符 a 上的 typeAnnotation，与它在代码中的位置相匹配。

场景 2 将使用以下解析器：

使用未定义的未知类型

function parser(code) {

// fn(“craig-string”);

const expressionAst = {

type: “ExpressionStatement”,

expression: {

type: “CallExpression”,

callee: {

type: “Identifier”,

arguments: [

{

type: “StringLiteral”, // Parser “Inference” for type.

value: “craig-string”

}

]

}

};

// function fn(a: made_up_type) {}

const declarationAst = {

type: “FunctionDeclaration”,

id: {

type: “Identifier”,

params: [

{

type: “Identifier”,

typeAnnotation: {

// our only type annotation

type: “TypeAnnotation”,

typeAnnotation: {

// 参数类型不同于场景 1

type: “made_up_type” // BREAKS

}

body: {

type: “BlockStatement”,

body: [] // “body” === block/line of code. Ours is empty

}

};

const programAst = {

type: “File”,

program: {

type: “Program”,

body: [expressionAst, declarationAst]

}

};

// normal AST except with typeAnnotations on

return programAst;

}

场景 2 的解析器的表达式、声明和程序 AST 块非常类似于场景 1。然而，区别在于 params 内部的 typeAnnotation 是 made_up_type，而不是场景 1 中的 NumberTypeAnnotation。

typeAnnotation: {

type: “made_up_type” // BREAKS

}

场景 3 使用以下解析器：

使用代码中未定义的属性名

function parser(code) {

// interface Person {

// name: string;

// }

const interfaceAst = {

type: “InterfaceDeclaration”,

id: {

type: “Identifier”,

body: {

type: “ObjectTypeAnnotation”,

properties: [

{

type: “ObjectTypeProperty”,

key: {

type: “Identifier”,

kind: “init”,

method: false,

value: {

type: “StringTypeAnnotation”,

};

// fn({nam: “craig”});

const expressionAst = {

type: “ExpressionStatement”,

expression: {

type: “CallExpression”,

callee: {

type: “Identifier”,

arguments: [

{

type: “ObjectExpression”,

properties: [

{

type: “ObjectProperty”,

method: false,

key: {

type: “Identifier”,

value: {

type: “StringLiteral”,

value: “craig”,

};

// function fn(a: Person) {}

const declarationAst = {

type: “FunctionDeclaration”,

id: {

type: “Identifier”,

params: [

{

type: “Identifier”,

typeAnnotation: {

type: “TypeAnnotation”,

typeAnnotation: {

type: “GenericTypeAnnotation”,

id: {

type: “Identifier”,

body: {

type: “BlockStatement”,

body: [], // Empty function

};

const programAst = {

type: “File”,

program: {

type: “Program”,

body: [interfaceAst, expressionAst, declarationAst],

};

// normal AST except with typeAnnotations on

return programAst;

}

除了表达式、声明和程序 AST 块之外，还有一个 interfaceAst 块，它负责保存 InterfaceDeclaration AST。

在declarationAst 块的 typeAnnotation 节点上有一个 GenericType，因为它接受一个对象标识符，即 Person。在这个场景中，programAst 将返回这三个对象的数组。

解析器的相似性

从上面可以得知，这三种有共同点， 3 个场景中保存所有的类型注解的主要区域是 declaration。

检查器

现在来看编译器的类型检查部分。

它需要遍历所有程序主体的 AST 对象，并根据节点类型进行适当的类型检查。我们将把所有错误添加到一个数组中，并返回给调用者以便打印。

在我们进一步讨论之前，对于每种类型，我们将使用的基本逻辑是：

函数声明：检查参数的类型是否有效，然后检查函数体中的每个语句。
表达式：找到被调用的函数声明，获取声明上的参数类型，然后获取函数调用表达式传入的参数类型，并进行比较。

代码

以下代码中包含 typeChecks 对象（和 errors 数组) ，它将用于表达式检查和基本的注解（annotation）检查。

const errors = [];

// 注解类型

const ANNOTATED_TYPES = {

NumberTypeAnnotation: “number”,

GenericTypeAnnotation: true

};

// 类型检查的逻辑

const typeChecks = {

// 比较形参和实参的类型

expression: (declarationFullType, callerFullArg) => {

switch (declarationFullType.typeAnnotation.type) {

// 注解为 number 类型

case “NumberTypeAnnotation”:

// 如果调用时传入的是数字，返回 true

return callerFullArg.type === “NumericLiteral”;

// 注解为通用类型

case “GenericTypeAnnotation”: // non-native

// 如果是对象，检查对象的属性

if (callerFullArg.type === “ObjectExpression”) {

// 获取接口节点

const interfaceNode = ast.program.body.find(

node => node.type === “InterfaceDeclaration”

);

const properties = interfaceNode.body.properties;

//遍历检查调用时的每个属性

properties.map((prop, index) => {

const name = prop.key.name;

const associatedName = callerFullArg.properties[index].key.name;

// 没有匹配，将错误信息存入 errors

if (name !== associatedName) {

errors.push(

Property "${associatedName}" does not exist on interface "${interfaceNode.id.name}". Did you mean Property "${name}"?

);

}

小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数初中级前端工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！
因此收集整理了一份《2024年Web前端开发全套学习资料》送给大家，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。