lex词法分析器用于消除二义性的两条规则
There are two important disambiguation rules used by Lex and other similar lexical-analyzer generators:
- Longest match: The longest initial substring of the input that can
match any regular expression is taken as the next token. - Rule priority: For a particular longest initial substring, the first
regular expression that can match determines its token type. This
means that the order of writing down the regular-expression rules has
significance.
lex是基于DFA 实现的
DFA construction is a mechanical task easily performed by computer, so it
makes sense to have an automatic lexical analyzer generator to translate regular expressions into a DFA.
lex的输出是C程序
Lex is a lexical analyzer generator that produces a C program from a lexical specification. For each token type in the programming language to be lexically analyzed, the specification contains a regular expression and an action. The action communicates the token type (perhaps along with other information) to the next phase of the compiler.
The output of Lex is a program in C – a lexical analyzerexecutes the action
fragments on each match. The action fragments are just C statements that
return token values.
lex的前世今生
Lex was the first lexical-analyzer generator based on regular expressions
[Lesk 1975]; it is still widely used.
DFA transition tables can be very large and sparse. If represented as a simple two-dimensional matrix (states × symbols) they take far too much memory. In practice, tables are compressed; this reduces the amount of memory
required, but increases the time required to look up the next state [Aho et al.
1986].
flex 比lex快,case语句执行效率相当快。flex和bison已被证明比原来的Unix工具lex yacc更可靠、更强大、更快
Automatically generated lexical analyzers are often criticized for being
slow. In principle, the operation of a finite automaton is very simple and
should be efficient, but interpreting from transition tables adds overhead.
Gray [1988] shows that DFAs translated directly into executable code (implementing states as case statements) can run as fast as hand-coded lexers. The
Flex “fast lexical analyzer generator” [Paxson 1995] is significantly faster
than Lex.