1 Background knowledge
1.1 Programing language 5 generation
1.2 Programing language Processors 3 type
-
Compiler
-
Interpreter
-
Hybrid Processor
1.3 Whole procedure of compilation
[Task] Role Function Input Output
-
Analysis Stage
- Lexical Analysis
- Syntax Analysis
- Semantic Analysis
-
Synthesis Stage
- Intermediate Code Generation TAC
- Machine-Independent Code Optimizer
- Code Generation 汇编代码
- Machine-Dependent Code Optimizer
- Reload & Link 加载链接形成可执行程序
1.4 Concepts
-
Language
- Natural language
- Artificial language (Formal language)
- Vocabularies Table
- Sentence: a sequence of lexemes from the alphabet table by some laws
- Language: a set of all sentences
-
Grammar
G = ( V T , V N , S , P ) L ( G ) = { α ∈ V T ∗ ∣ S ⇒ α } G = (V_{T},V_{N},S,P) \\ L(G) = \{{\alpha} \in V_{T}^{*} | S \Rightarrow \alpha \} G=(VT,VN,S,P)L(G)={α∈VT∗∣S⇒α}
4 types of grammars
1.5 Design grammar for a given language
how to design a grammar
ε \varepsilon ε - free grammar
S -> ε \varepsilon ε S is start symbol and S should not appear in right-side of any production
how to design a ε \varepsilon ε - free grammar
Ambiguous grammar
2 Inside of Compilation
2.1 Lexical Analyzer/Scanner
-
Role
-
Tasks
-
Scanning/Buffering Buffer pair
-
Lexical Analysis Tokens
-
Recognize Tokens
-
Lexical Patterns
Regular expression
Regular grammar
Finite automata (FA) RE -> NFA -> DFA -> mDFA
-
-
Message Error or Warning
-
-
-
Input/Output
Input: Source Program
Output: Sequence of tokens/Symbol table
2.2 Syntax Analyzer/Parser
-
Role
-
Tasks
-
Syntax Analysis
-
Top-down(Derivation) Pushdown Automata
-
Derivation & Backmatching
-
Basic 递归下降
Left Recursion / Never halt
Back Tracking / Time consuming / Inefficient
Can’t Identify errors
-
LL(1) Algorithm
Eliminate left recursion
Lift left max common factors
Construct prediction table
FIRST & FOLLOW
-
-
Bottom-Up(Reduction)
-
Basic Algorithm
Identify Handling
Back Tracking / Inefficient
Can’t identify errors exactly
-
LR Algorithm family
SLR
LR(1)
LALR
-
-
-
messages
-
Identify error position
-
-
Input/Output
Input: Sequence of tokens/Symbol table
Output: Parsing tree
2.3 Semantics Analysis
-
Role
Source Program
Scanner
Sequence of tokens/Symbol table
Parser C、C++、Java
Parsing tree
semantics Analyzer
Annotated parse tree
-
Tasks
-
Evaluating the value of each Attributions of the grammar symbol in the Parse tree. (By semantics rules to do this)
- Type checking
- value
- code (TAC generator)
Syntax directed Semantic Analysis 语法制导的语义分析
-
Handling errors.
-
-
Input/Output
Input: Parse Tree
Output: Annotated Parse tree
设计语义规则
产生式
属性 Inherited / Synthesis 继承性 综合性
Synthesized, whose value is defined only in terms of attributes values at the children and itself. E.g., val
Inherited, whose value is defined only in terms of attributes values at the parents, siblings and itself. E.g., type
2.4 Intermediate Generation (TAC generation)
-
Role
-
Forms of three address code
-
Storage Forms of three address code
-
Short-curcuit(Jumping) code generation method
2.5 Machine Independent Optimization
-
Role
-
Tasks
-
Collecting Information
- Control Flow Analysis
-
Blocks
(1) The first statement is a leader.
(2) Any statement that is the target of a conditional or unconditional goto is a leader.
(3) Any statement that immediately follows a goto or conditional goto statement is a leader.
-
Loops
-
- Data Flow Analysis
- Dead Position
- Live Assignment 有用的赋值
- Control Flow Analysis
-
Transformation
- Block/Local Optimization
- Global Optimization
- Loop 循环不变代码外提
- 循环变量i优化
Directed acyclic graph 有向无环图 (DAG)
-
-
Input/Output
2.6 Run-time Environment Management
-
Role
Some Program
Compilation
.exe
code area
data area
运行时环境管理包
-
Task
为什么用栈 栈好 可以用栈
-
存储空集利用连续
-
每次只用Top (一个size大小)
Heap浪费空间和时间 但是灵活
-