课程首页
http://web.stanford.edu/class/cs143/
课程要求
Open the lid of compilers and see inside
- Understand what they do
- Understand how they work
- Understand how to build them
Correctness over performance
- Correctness is essential in compilers
- They must produce correct code
- CS143 is more like CS103+CS110 than CS107
- Other classes focus on performance (CS149, CS243)
课程目标
You will write your own compiler in 4 parts.
语言是如何实现的?
有两种主要策略
- Interpreters run your program
- Compilers translate your program
Compilers dominate low-level languages
- C, C++, Go, Rust
Interpreters dominate high-level languages
- Python, Ruby
Some language implementations provide both
- Java, Javascript, WebAssembly
- Interpreter + Just in Time (JIT) compiler
编译器架构
- Lexical Analysis — identify words
- Parsing — identify sentences
- Semantic Analysis — analyse sentences
- Optimization — editing
- Code Generation — translation
词法分析
- 识别单词:Smallest unit above letters
- Lexical analyzer divides program text into “words” or “tokens”
Parsing
- Once words are understood, the next step is to understand sentence structure
- Parsing = Diagramming Sentences(The diagram is a tree)
Diagramming a Sentence
Parsing program expressions is the same,consider:
If x == y then z = 1; else z = 2; \text{If x == y then z = 1; else z = 2;} If x == y then z = 1; else z = 2;
Diagrammed:
Semantic Analysis
- Once sentence structure is understood, we can try to understand “meaning”
- Compilers perform limited semantic analysis to catch inconsistencies
一些英语语境中语法分析的例子:
- Jack said Jerry left his assignment at home.(What does “his” refer to? Jack or Jerry?)
- Jack said Jack left his assignment at home?(How many Jacks are there? Which one left the assignment?)
编程语言中的语法分析:
- Programming languages define strict rules to avoid such ambiguities
- 如下的C++代码会打印
4
,使用内层的Jack定义
{
int Jack = 3;
{
int Jack = 4;
cout << Jack;
}
}
Compilers perform many semantic checks besides variable bindings
例子:
Jack left her homework at home.(Possible type mismatch between her and Jack,Jack是一个男性)
Optimization
Akin to editing:
- Minimize reading time
- Minimize items the reader must keep in short-term memory
Automatically modify programs so that they
- Run faster
- Use less memory
- In general, to use or conserve some resource
(CS143课程中并没有涉及到优化的部分)
优化的例子:
X = Y * 0 is the same as X = 0
Code Generation
- Typically produces assembly code
- Generally a translation into another language
Intermediate Representations
Many compilers perform translations between successive intermediate languages
All but first and last are intermediate representations (IR) internal to the compiler
IRs are generally ordered in descending level of abstraction
- Highest is source
- Lowest is assembly
IRs are useful because lower levels expose features hidden by higher levels:
- registers
- memory layout
- raw pointers
- etc
But lower levels obscure high-level meaning:
- Classes
- Higher-order functions
- Even loops
- etc
Issues
- Compiling is almost this simple, but there are many pitfalls
- Example: How to handle erroneous programs?
Language design has big impact on compiler:
- Determines what is easy and hard to compile
- Course theme: many trade-offs in language design