linux flex 手册

flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
[--help --version] [filename...]




'x'     匹配字符'x'

'.'     除了新行'\n'以外的任何字符(字节)

'[xyz]'         一个"字符类";在这种情况,该模式匹配'x','y','z'中任意一个

`[abj-oZ]'      一个内部带有一个范围的"字符类";匹配一个'a',一个'b',任意一个从'j'到'o'间的字母,或是一个'Z'

`[^A-Z]'        一个"否定字符类",也就是,任意一个不在给出的范围字符类内的字符.在这里,任意的非大写字符.

`[^A-Z\n]'      任意的非大写字母或是一个新行.

`r*'    0个或以上的r,r是任意的正规表达

`r+'    1个或以上的r

`r?'    0个或1个的r(也即,"一个可选的r")

`r{2,5}'                在任何地方出现的范围是2到5个的r.

`r{2,}'         2个或以上的r

`r{4}'          恰好4个的r

`{name}'                "name"定义的扩展(看上面)

`"[xyz]\"foo"'          逐字的字符串: '[xyz]"foo'

`\的'   如果x是一个`a', `b', `f',`n', `r', `t', 或 `v',那么ANSI-C解释\x,否则,

`\0'    一个NUL字符(ASCII码的0)

`\123'          带有八进制值123的字符

`\x2a'          带有十六进制值123的字符

`(r)'           匹配一个r;括号被用来忽视优先级(看下面)

`rs'            后跟规则表达s的规则表达r;叫做"concatenation"连结

`r|s'           一个r或是一个s

`r/s'   后只跟一个s的r.该文本根据s是否被包含匹配当决定这个规则是否是最长的匹配时,
        (有一些flex不能正确匹配的`r/s'的组合,看底下Deficiencies/Bugs这节关于"dangerous trailing context"的注意)

`^r'    一个r,但该r只是在行首(也即是,要扫描的开始处,或者说是在一个新行已被扫描到后).

`r$'    一个r,但该r只是在行尾(也即是,恰好在一个新行前).等价于"r/\n".注意flex对"newline"的主张正好象

`<s>r'          一个r,但该r只是存在于开始条件s内时(看下面对开始条件的讨论)<s1,s2,s3>r也一样,

`<*>r'          一个在任何开始条件内的r,甚至是独占的条件.

`<<EOF>>'       一个end-of-file

`<s1,s2><<EOF>>'        一个end-of-file当在开始条件s1或s2内时







[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]






    不兼容性是历史原因造成的.匹配新行意味这象 [^"]* 样的模式能匹配整个输入除非在输入里有别的引用





        foo      |
        bar$     /* action goes here */


How the input is matched(输入是咋样被匹配的)

   一个以上的匹配,它取匹配最长文本的一个(对于trailing context规则,这个最长包括了trailing部分的长度,




   用 '%pointer'的好处是在匹配非常大的记号(除非你用完了动态内存)时扫描快些而且没有缓冲区溢出.

   用 '%array'的好处是你能按你的意思修改yytext,并且并且'unput()'函数调用不会破坏当前yytext的内容(看下面),
        extern char yytext[];


   '%array'定义yytext为一含YYLMAX个元素的字符数组,默认的一个非常大的值.你能简单的#define YYLMAX *在你的flex输

 是它的动作,假如动作为空,那么该模式匹配的输入被简单的废弃.例如这是一个删除所有出现"zap me"形式输入的特别的程序:

        "zap me"



        [ \t]+        putchar( ' ' );
        [ \t]+$       /* ignore this token */


 在用'%array'的那种情形, 动作可以任何方式自由地修改yytext.
        该规则被按"How the Input is Matched"选中,
                int word_count = 0;

        frob        special(); REJECT;
        [^ \t\n]+   ++word_count;

        a        |
        ab       |
        abc      |
        abcd     ECHO; REJECT;
.       |\n     /* eat up any unmatched character */


        mega-    ECHO; yymore();
        kludge   ECHO;

        foobar    ECHO; yyless(3);
        [a-z]+    ECHO;

        int i;
        /* Copy yytext because unput()trashes yytext */
        char *yycopy = strdup( yytext);
        unput( ')' );
        for ( i = yyleng - 1; i >= 0;--i )
                unput(yycopy[i] );
        unput( '(' );
        free( yycopy );
        注意因每一个'unput()'安置给定的字符到输入流的开头,向后推(pushing back)字符串必须从后到前完成.用'unput'时一个重要的潜在的问题是假如你
        用了'%pointer'(默认),一个'unput()'调用将破坏yytext的内容,starting withits rightmost character and devouring one character to the left with eachcall
        假如你需要保留yytext的值在'unput()'调用后(就象在上面的例子样),你必须首先拷贝它到其他地方,或'%array'构建你的扫描器(看How The Input Is Matched)

        "/*"        {
                registerint c;

                for ( ; ; )
                        while ( (c = input()) != '*' &&
                                c != EOF )
                        ;   /* eat up text of comment */

                        if ( c == '*' )
                        while ( (c =input()) == '*' )
                        if ( c == '/' )
                        break;    /* found the end */

                        if ( c == EOF )
                        error( "EOF in comment" );
        四)YY_FLUSH_BUFFER flushes扫描器内部缓冲区以便下一次扫描器尝试匹配一个记号时,
        它将首先用YY_INPUT填满缓冲区(看下面The Generated Scanner).
        该动作是'yy_flush_buffer()'函数的更一般的情况,在Multiple Input Buffers这个段内描述.

The generated scanner(一般的扫描器)


        int yylex()
        ...various definitions and the actions in here ...

(假如你的环境支持函数原型,那么它将是"int yylex(void)".)可以通过定义宏"YY_DECL"改变这个定义.举个例子,你可用:

#define YY_DECL float lexscan( a, b ) float a, b;


或'yyrestart()'被调用.'yyrestart()'带了一个参数,一个'FILE *'的指针(可以为nil,假如你已经建立YY_INPUT以扫描
还能通过带一个yyin的参数调用以废弃当前输入缓冲区;但更好的是用 YY_FLUSH_BUFFER (看上面).
注意'yyrestart()'不能重新设置开始状态to INITIAL(看下面Start Conditions)




#define YY_INPUT(buf,result,max_size) \
    { \
    int c = getchar(); \
    result = (c == EOF) ? YY_NULL :(buf[0] = c, 1); \


假如你不提供你自己的‘yywrap()’版本。那么你必须用‘%option noyywrap’(在扫描器行为象"yywrap()返回1"的情况),

有三个可用于扫描内存缓冲区内而不是文件的例程:‘yy_scan_string()', ‘yy_scan_bytes()', and ‘yy_scan_buffer()'。他们的讨论
见Multiple InputBuffers.


Start conditions(开始状态)


<STRING>[^"]*        { /* eatup the string body ... */


<INITIAL,STRING,QUOTE>\.        {/* handle an escape ... */
只有当前开始状态是"INITIAL","STRING", 或"QUOTE"时才被激活。



%s example

<example>foo  do_something();

bar            something_else();


%x example

<example>foo  do_something();

<INITIAL,example>bar   something_else();



%x example

<example>foo  do_something();

<*>bar    something_else();

默认规则(`ECHO' 任何不匹配的字符) 在开始条件中仍然激活.它等价于:

<*>.|\\n     ECHO;


`BEGIN(0)' returns to the original state where only the rules with no startconditions are active.
This state can also be referred to as the start-condition "INITIAL",so `BEGIN(INITIAL)' is equivalent to `BEGIN(0)'.
 (The parentheses around the startcondition name are not required but are considered good style.)

BEGIN actions can also be given as indented code at the beginning of the rulessection. For example,
the following will cause the scanner to enter the "SPECIAL" startcondition whenever `yylex()' is called
and the global variable enter_special is true:
        int enter_special;

        if ( enter_special )

...more rules follow...

 To illustrate the uses of startconditions, here is a scanner which provides two different interpretations of astring like "123.456".
 By default it will treat it as as threetokens, the integer "123", a dot ('.'), and the integer"456".
 But if the string is preceded earlier inthe line by the string "expect-floats" it will treat it as a singletoken,
 the floating-point number 123.456:
 但假如字符串在某行被更早的字符串"expect-floats"领先, flex将把它处理成一个单一的记号,
#include <math.h>
%s expect

expect-floats        BEGIN(expect);

<expect>[0-9]+"."[0-9]+     {
            printf( "found a float,= %f\n",
                    atof( yytext ));
<expect>\n           {
            /* that's the end of theline, so
             * we need another"expect-number"
             * before we'll recognize anymore
             * numbers

[0-9]+      {

Version 2.5               December1994                        18

            printf( "found aninteger, = %d\n",
                    atoi( yytext ));

"."         printf( "founda dot\n" );

 Here is a scanner which recognizes (anddiscards) C comments while maintaining a count of the current input line.
 这是个识别C注释的扫描器 维护当前输入行的数字,
%x comment
        int line_num = 1;

"/*"         BEGIN(comment);

<comment>[^*\n]*        /* eatanything that's not a '*' */
<comment>"*"+[^*/\n]*  /* eat up '*'s not followed by '/'s */
<comment>\n            ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);

 This scanner goes to a bit of trouble tomatch as much text as possible with each rule.
 In general, when attempting to write ahigh-speed scanner try to match as much possible in each rule,
 as it's a big win.

Note that start-conditions names are really integer values and can be stored assuch. Thus,
the above could be extended in the following fashion:
%x comment foo
        int line_num = 1;
        int comment_caller;

"/*"         {
             comment_caller =INITIAL;


<foo>"/*"    {
             comment_caller = foo;

<comment>[^*\n]*        /* eatanything that's not a '*' */
<comment>"*"+[^*/\n]*  /* eat up '*'s not followed by '/'s */
<comment>\n            ++line_num;
<comment>"*"+"/"        BEGIN(comment_caller);

 Furthermore, you can access the currentstart condition using the integer-valued YY_START macro.
 For example, the above assignments tocomment_caller could instead be written
comment_caller = YY_START;

 Flex provides YYSTATE as an alias forYY_START (since that is what's used by AT&T lex).
 Flex提供YYSTATE作为YY_START的一个别名(因AT&T lex也是这样)

Note that start conditions do not have their own name-space; %s's and %x'sdeclare names in the same fashion as #define's.

Finally, here's an example of how to match C-style quoted strings usingexclusive start conditions,
including expanded escape sequences (but not including checking for a stringthat's too long):
%x str

        char *string_buf_ptr;

\"      string_buf_ptr = string_buf;BEGIN(str);

<str>\"        { /* sawclosing quote - all done */
        *string_buf_ptr = '\0';
        /* return string constant tokentype and
         * value to parser

<str>\n        {
        /* error - unterminated stringconstant */
        /* generate error message*/

<str>\\[0-7]{1,3} {
        /* octal escape sequence */
        int result;

        (void) sscanf( yytext + 1, "%o",&result );

        if ( result > 0xff )
                /* error, constant isout-of-bounds */

        *string_buf_ptr++ = result;

<str>\\[0-9]+ {
        /* generate error - bad escapesequence; something
         * like '\48' or '\0777777'

<str>\\n  *string_buf_ptr++ ='\n';
<str>\\t  *string_buf_ptr++ ='\t';
<str>\\r  *string_buf_ptr++ ='\r';
<str>\\b  *string_buf_ptr++ ='\b';
<str>\\f  *string_buf_ptr++ ='\f';

<str>\\(.|\n)  *string_buf_ptr++ =yytext[1];

<str>[^\\\n\"]+        {
        char *yptr = yytext;

        while ( *yptr )
                *string_buf_ptr++ =*yptr++;

 Often, such as in some of the examplesabove, you wind up writing a whole bunch of rules all preceded by the samestart condition(s).
 Flex makes this a little easier andcleaner by introducing a notion of start condition scope.
 A start condition scope is begunwith:

 where SCs is a list of one or more startconditions. Inside the start condition scope,
 every rule automatically has the prefix`<SCs>' applied to it,
 until a `}' which matches the initial`{'. So, for example,
    "\\n"   return '\n';
    "\\r"   return '\r';
    "\\f"   return '\f';
    "\\0"   return '\0';

 is equivalent to:

<ESC>"\\n"  return'\n';
<ESC>"\\r"  return'\r';
<ESC>"\\f"  return'\f';
<ESC>"\\0"  return'\0';

 Start condition scopes may benested.

Three routines are available for manipulating stacks of start conditions:
`void yy_push_state(int new_state)'
pushes the current start condition onto the top of the start condition stackand switches to new_state as though
you had used `BEGIN new_state' (recall that start condition names are alsointegers).
把当前开始条件压进开始条件栈并转换到new_state就好像你已经使用过"BEGIN new_state"(回想开始条件名也是整形数).
void yy_pop_state()'
pops the top of the stack and switches to it via BEGIN.
`int yy_top_state()'
returns the top of the stack without altering the stack's contents.
The start condition stack grows dynamically and so has no built-in sizelimitation.
If memory is exhausted, program execution aborts.

To use start condition stacks, your scanner must include a `%option stack'directive (see Options below).
要用开始条件栈,你的扫描器必须包含`%option stack'指令(看下面的选项)

Multiple input buffers多个输入缓冲区

 Some scanners (such as those whichsupport "include" files) require reading from several input streams.
 As flex scanners do a large amount ofbuffering, one cannot control where the next input will be read
 from by simply writing a YY_INPUT whichis sensitive to the scanning context.
 YY_INPUT is only called when the scannerreaches the end of its buffer,
 which may be a long time after scanninga statement such as an "include" which requires switching the inputsource.

To negotiate these sorts of problems, flex provides a mechanism for creatingand switching between multiple input buffers.
An input buffer is created by using:

YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
 which takes a FILE pointer and a sizeand creates a buffer associated with the given file and
 large enough to hold size characters(when in doubt, use YY_BUF_SIZE for the size).
 It returns a YY_BUFFER_STATE handle,which may then be passed to other routines (see below).
 The YY_BUFFER_STATE type is a pointer toan opaque struct yy_buffer_state structure,
 so you may safely initializeYY_BUFFER_STATE variables to `((YY_BUFFER_STATE) 0)' if you wish,
 and also refer to the opaque structurein order to correctly declare input buffers in source files other than that ofyour scanner.
 Note that the FILE pointer in the callto yy_create_buffer is only used as the value of yyin seen by YY_INPUT;
 if you redefine YY_INPUT so it no longeruses yyin, then you can safely pass a nil FILE pointer toyy_create_buffer.
 You select a particular buffer to scanfrom using:
因此只要你希望你就能安全的初始化YY_BUFFER_STATE的值为'((YY_BUFFER_STATE) 0)',
并且也暗示(refer to)该不透明结构是为了在源文件中正确地声明输入缓冲区而不是为了你的扫描器.

void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
 switches the scanner's input buffer sosubsequent tokens will come from new_buffer.
 Note that `yy_switch_to_buffer()' may beused by `yywrap()' to
 set things up for continued scanning,instead of opening a new file and pointing yyin at it.
  Note also that switching input sourcesvia either `yy_switch_to_buffer()' or `yywrap()'
  does not change the startcondition.
也要注意经由`yy_switch_to_buffer()'或 `yywrap()'转换输入源不会改变开始条件.
void yy_delete_buffer( YY_BUFFER_STATE buffer )
is used to reclaim the storage associated with a buffer.
You can also clear the current contents of a buffer using:
void yy_flush_buffer( YY_BUFFER_STATE buffer )

 This function discards the buffer's contents,so the next time the scanner attempts to match a token from the buffer,
 it will first fill the buffer anew usingYY_INPUT.
 这个函数丢弃缓冲区内容,以便下一次扫描器试图从该缓冲区匹配一个记号时, 它将首先用YY_INPUT填充该缓冲区.

`yy_new_buffer()' is an alias for `yy_create_buffer()', provided forcompatibility with the C++ use of new and delete for creating and destroyingdynamic objects.

Finally, the YY_CURRENT_BUFFER macro returns a YY_BUFFER_STATE handle to the currentbuffer.

Here is an example of using these features for writing a scanner which expandsinclude files (the `<<EOF>>' feature is discussed below):
这儿是一个用这些特色写出的一个扩展include文件的扫描器的例子( `<<EOF>>'特色在下面讨论)
/* the "incl" state is used for picking up the name
 * of an include file
%x incl

int include_stack_ptr = 0;

include             BEGIN(incl);

[a-z]+              ECHO;
[^a-z\n]*\n?        ECHO;

<incl>[ \t]*      /* eat thewhitespace */
<incl>[^ \t\n]+   { /* got theinclude file name */
        if ( include_stack_ptr >=MAX_INCLUDE_DEPTH )
            fprintf( stderr, "Includesnested too deeply" );
            exit( 1 );

       include_stack[include_stack_ptr++] =

        yyin = fopen( yytext,"r" );

        if ( ! yyin )
            error( ... );

            yy_create_buffer( yyin,YY_BUF_SIZE ) );


<<EOF>> {
        if ( --include_stack_ptr < 0)

            yy_delete_buffer( YY_CURRENT_BUFFER);
                include_stack[include_stack_ptr] );

 Three routines are available for settingup input buffers for scanning in-memory strings instead of files.
 All of them create a new input bufferfor scanning the string, and return a corresponding YY_BUFFER_STATE handle(which you should delete with `yy_delete_buffer()' when done with it).
 They also switch to the new buffer using`yy_switch_to_buffer()', so the next call to `yylex()' will start scanning thestring.
 三个可用于为内存字符串而不是文件建立输入缓冲区的例程.他们都为扫描字符串建立新的输入缓冲区,而且都返回相应的YY_BUFFER_STATE 句柄(当处理句柄时你可用`yy_delete_buffer()'删除它的那个)
yy_scan_string(const char *str)'
scans a NUL-terminated string.扫描一个非终结字符串.
`yy_scan_bytes(const char *bytes, int len)'
scans len bytes (including possibly NUL's) starting at location bytes.从bytes指定的位置开始扫描len个字节

Note that both of these functions create and scan a copy of the string or bytes.
(This may be desirable, since `yylex()' modifies the contents of the buffer itis scanning.)
You can avoid the copy by using:
`yy_scan_buffer(char *base, yy_size_t size)'
which scans in place the buffer starting at base, consisting of size bytes,
the last two bytes of which must be YY_END_OF_BUFFER_CHAR (ASCII NUL).
These last two bytes are not scanned; thus, scanning consists of `base[0]'through `base[size-2]',
inclusive. If you fail to set up base in this manner (i.e., forget the finaltwo YY_END_OF_BUFFER_CHAR bytes),
then `yy_scan_buffer()' returns a nil pointer instead of creating a new inputbuffer.
The type yy_size_t is an integral type to which you can cast an integerexpression
reflecting the size of the buffer.
End-of-file rules

 The special rule"<<EOF>>" indicates actions which are to be taken when anend-of-file is encountered
 and yywrap() returns non-zero (i.e.,indicates no further files to process).
 The action must finish by doing one offour things:
assigning yyin to a new input file (in previous versions of flex, after doingthe assignment you had to call the special action YY_NEW_FILE; this is nolonger necessary);
executing a return statement;
executing the special `yyterminate()' action;
or, switching to a new buffer using `yy_switch_to_buffer()' as shown in theexample above.
<<EOF>> rules may not be used with other patterns; they may only bequalified with a list of start conditions. If an unqualified<<EOF>> rule is given,
it applies to all start conditions which do not already have<<EOF>> actions.
To specify an <<EOF>> rule for only the initial start condition,use

 These rules are useful for catchingthings like unclosed comments. An example:
%x quote

...other rules for dealing with quotes...

<quote><<EOF>>  {
         error( "unterminatedquote" );
<<EOF>>  {
         if ( *++filelist )
             yyin = fopen( *filelist,"r" );

Miscellaneous macros

 The macro YY_USER_ACTION can be definedto provide an action which is always executed prior to the matched rule'saction.
 For example, it could be #define'd tocall a routine to convert yytext to lower-case.
 When YY_USER_ACTION is invoked, thevariable yy_act gives the number of the matched rule (rules are numberedstarting with 1).
 Suppose you want to profile how ofteneach of your rules is matched. The following would do the trick:
#define YY_USER_ACTION ++ctr[yy_act]

 where ctr is an array to hold the countsfor the different rules.
 Note that the macro YY_NUM_RULES givesthe total number of rules (including the default rule, even if you use `-s', soa correct declaration for ctr is:
int ctr[YY_NUM_RULES];

 The macro YY_USER_INIT may be defined toprovide an action which is always executed before the first scan (and beforethe scanner's internal initializations are done).
 For example, it could be used to call aroutine to read in a data table or open a logging file.

The macro `yy_set_interactive(is_interactive)' can be used to control whetherthe current buffer is considered interactive. An interactive buffer isprocessed more slowly,
but must be used when the scanner's input source is indeed interactive to avoidproblems due to waiting to fill buffers (see the discussion of the `-I' flagbelow).
A non-zero value in the macro invocation marks the buffer as interactive, azero value as non-interactive.
Note that use of this macro overrides `%option always-interactive' or `%optionnever-interactive' (see Options below).
`yy_set_interactive()' must be invoked prior to beginning to scan the bufferthat is (or is not) to be considered interactive.
注意该宏的这个用途忽视'%option always-interactive'或'%option never-interactive'(看下面的选项).

The macro `yy_set_bol(at_bol)' can be used to control whether the currentbuffer's scanning context
for the next token match is done as though at the beginning of a line.
A non-zero macro argument makes rules anchored with

The macro `YY_AT_BOL()' returns true if the next token scanned from the currentbuffer will have '^' rules active, false otherwise.

In the generated scanner, the actions are all gathered in one large switchstatement and separated using YY_BREAK,
which may be redefined. By default, it is simply a "break", toseparate each rule's action from the following rule's. Redefining YY_BREAKallows, for example, C++ users to #define YY_BREAK to do nothing (while beingvery careful that every rule ends with a "break" or a"return"!)
to avoid suffering from unreachable statement warnings where because a rule'saction ends with "return",
the YY_BREAK is inaccessible.
一般的扫描器内,所有的动作聚积在一个大的switch语句并且用YY_BREAK分开,YY_BREAK可以被重定义,默认情形,它是一个简单的"break",简单的把每个动作跟接下来的动作分离开来.允许重定义YY_BREAK,例如,C++用户#define YY_BREAK 做不了任何事(当变的非常小心)

Values available to the user

 This section summarizes the variousvalues available to the user in the rule actions.

`char *yytext' holds the text of the current token.
 It may be modified but not lengthened(you cannot append characters to the end).
 If the special directive `%array'appears in the first section of the scanner description,
 then yytext is instead declared `charyytext[YYLMAX]',
 where YYLMAX is a macro definition thatyou can redefine in the first section
 if you don't like the default value(generally 8KB). Using `%array' results in somewhat slower scanners,
 but the value of yytext becomes immuneto calls to `input()' and `unput()',
 which potentially destroy its value whenyytext is a character pointer.
 The opposite of `%array' is `%pointer',which is the default.
 You cannot use `%array' when generatingC++ scanner classes (the `-+' flag).
 char *yytext保留当前记号的文本.
 假如特别的指令'%array'出现在扫描器说明的第一段,那么yytext被替代为声明"char yytext[YYLMAX]',

`int yyleng' holds the length of the current token.
int yyleng保留当前记号的长度.

`FILE *yyin' is the file which by default flex reads from.
It may be redefined but doing so only makes sense before scanning begins orafter an EOF has been encountered.
Changing it in the midst of scanning will have unexpected results since flexbuffers its input;
use `yyrestart()' instead. Once scanning terminates because an end-of-file hasbeen seen,
you can assign yyin at the new input file and then call the scanner again tocontinue scanning.
'FILE *yyin'是flex默认读入的文件的指针.它可以被重定义但这样做只是在扫描前或遇到一个EOF后说的通

`void yyrestart( FILE *new_file )' may be called to point yyin at the new inputfile.
The switch-over to the new file is immediate (any previously buffered-up inputis lost).
Note that calling `yyrestart()' with yyin as an argument thus throws away
the current input buffer and continues scanning the same input file.
`void yyrestart( FILE *new_file )' 可以被调用以便在新的输入文件中指向yyin.

`FILE *yyout' is the file to which `ECHO' actions are done. It can bereassigned by the user.
'FILE *yyout','ECHO'动作完成后的结果输出文件.它可以由用户重新赋值.

YY_CURRENT_BUFFER returns a YY_BUFFER_STATE handle to the current buffer.

YY_START returns an integer value corresponding to the current start condition.
You can subsequently use this value with BEGIN to return to that startcondition.

Interfacing with yacc与yacc的接口

 One of the main uses of flex is as acompanion to the yacc parser-generator.
 yacc parsers expect to call a routinenamed `yylex()' to find the next input token.
 The routine is supposed to return thetype of the next token as well as putting any associated value in the globalyylval.
 To use flex with yacc, one specifies the`-d' option to yacc to instruct it to generate the file `' containingdefinitions of all the `%tokens' appearing in the yacc input.
 This file is then included in the flexscanner. For example, if one of the tokens is "TOK_NUMBER", part ofthe scanner might look like:
#include ""


[0-9]+        yylval = atoi( yytext );return TOK_NUMBER;


flex has the following options: flex有下面的选项:

Generate backing-up information to `lex.backup'.
This is a list of scanner states which require backing up and the inputcharacters on which they do so.
By adding rules one can remove backing-up states. If all backing-up states areeliminated and `-Cf' or `-CF' is used,
the generated scanner will run faster (see the `-p' flag).
Only users who wish to squeeze every last cycle out of their scanners needworry about this option. (See the section on Performance Considerationsbelow.)
只有那些希望在他们的扫描器之外去squeeze每一个最后的循环的用户需要担心这个选项.(看下面的Performance considerations段)
is a do-nothing, deprecated option included for POSIX compliance.
makes the generated scanner run in debug mode. Whenever a pattern is recognizedand
the global yy_flex_debug is non-zero (which is the default), the scanner willwrite to stderr a line of the form:
--accepting rule at line 53 ("the matched text")
 The line number refers to the locationof the rule in the file defining the scanner (i.e., the file that was fed toflex).
 Messages are also generated when thescanner backs up, accepts the default rule,
 reaches the end of its input buffer (orencounters a NUL; at this point,
 the two look the same as far as thescanner's concerned), or reaches an end-of-file.
消息也被产生当扫描器当扫描器backs up,接受默认规则,到达它的输入缓冲区的末尾(或遇到一个NUL;
specifies fast scanner. No table compression is done and stdio is bypassed. Theresult is large but fast.
This option is equivalent to `-Cfr' (see below).
generates a "help" summary of flex's options to stdout and thenexits. `-?' and `--help' are synonyms for `-h'.
instructs flex to generate a case-insensitive scanner.
The case of letters given in the flex input patterns will be ignored,
and tokens in the input will be matched regardless of case.
The matched text given in yytext will have the preserved case (i.e., it willnot be folded).
turns on maximum compatibility with the original AT&T leximplementation.
Note that this does not mean full compatibility. Use of this option costs aconsiderable amount of performance,
and it cannot be used with the `-+, -f, -F, -Cf', or `-CF' options. For detailson the compatibilities it provides,
see the section "Incompatibilities With Lex And POSIX" below.
This option also results in the name YY_FLEX_LEX_COMPAT being #define'd in thegenerated scanner.
开启与最初的AT&T lex最大兼容性的实现.注意这不意味着完全兼容.这个选项会有相当大的效率花费,并且它不能跟'-+,-f,-F,-Cf','-CF'选项一起用.细节请看它提供的compathibilities,
看下面的"IncompatibilitiesWith Lex And POSIX"段.
is another do-nothing, deprecated option included only for POSIXcompliance.
generates a performance report to stderr. The report consists of commentsregarding features of the flex input file
which will cause a serious loss of performance in the resulting scanner.
If you give the flag twice, you will also get comments regarding features thatlead to minor performance losses.
Note that the use of REJECT, `%option yylineno' and variable trailingcontext
(see the Deficiencies / Bugs section below) entails a substantial performancepenalty;
use of `yymore()', the `^' operator, and the `-I' flag entail minor performancepenalties.
假如给出该标志两次,你将也能得到"最小效率损失的"的报告.注意REJECT的用法,'%optionyylineno'和不定的紧随上下文(看下面的the Deficiencies / Bugs)蒙受实质上的性能报应.
causes the default rule (that unmatched scanner input is echoed to stdout) tobe suppressed.
If the scanner encounters input that does not match any of its rules, it abortswith an error.
This option is useful for finding holes in a scanner's rule set.
instructs flex to write the scanner it generates to standard output instead of`lex.yy.c'.
specifies that flex should write to stderr a summary of statistics regardingthe scanner it generates.
Most of the statistics are meaningless to the casual flex user, but the firstline identifies the version of flex
(same as reported by `-V'), and the next line the flags used when generatingthe scanner,
including those that are on by default.
suppresses warning messages.
instructs flex to generate a batch scanner, the opposite of interactivescanners generated by `-I' (see below).
In general, you use `-B' when you are certain that your scanner will never beused interactively,
and you want to squeeze a little more performance out of it.
If your goal is instead to squeeze out a lot more performance,
you should be using the `-Cf' or `-CF' options (discussed below), which turn on`-B' automatically anyway.
specifies that the fast scanner table representation should be used (and stdiobypassed).
This representation is about as fast as the full table representation`(-f)',
and for some sets of patterns will be considerably smaller (and for others,larger).
In general, if the pattern set contains both "keywords" and acatch-all, "identifier" rule, such as in the set:
"case"    returnTOK_CASE;
"switch"  returnTOK_SWITCH;
"default" return TOK_DEFAULT;
[a-z]+    return TOK_ID;

 then you're better off using the fulltable representation.
 If only the "identifier" ruleis present and you then use a hash table or some such to detect thekeywords,
 you're better off using `-F'. Thisoption is equivalent to `-CFr' (see below). It cannot be used with `-+'.
"case"    returnTOK_CASE;
"switch"  returnTOK_SWITCH;
"default" return TOK_DEFAULT;
[a-z]+    return TOK_ID;

instructs flex to generate an interactive scanner.
An interactive scanner is one that only looks ahead to decide what token hasbeen matched if it absolutely must.
It turns out that always looking one extra character ahead,
even if the scanner has already seen enough text to disambiguate the currenttoken,
is a bit faster than only looking ahead when necessary.
But scanners that always look ahead give dreadful interactive performance; forexample,
when a user types a newline, it is not recognized as a newline token until theyenter another token,
which often means typing in another whole line.
Flex scanners default to interactive unless you use the `-Cf' or `-CF'table-compression options (see below).
That's because if you're looking for high-performance you should be using oneof these options, so if you didn't,
flex assumes you'd rather trade off a bit of run-time performance for intuitiveinteractive behavior.
Note also that you cannot use `-I' in conjunction with `-Cf' or `-CF'.Thus,
this option is not really needed; it is on by default for all those cases inwhich it is allowed.
You can force a scanner to not be interactive by using `-B' (see above).
instructs flex not to generate `#line' directives. Without this option,
flex peppers the generated scanner with #line directives so error messages inthe actions will be correctly located ith
respect to either the original flex input file (if the errors are due to codein the input file),or `lex.yy.c'
(if the errors are flex's fault -- you should report these sorts of errors tothe email address given below).
makes flex run in trace mode.
It will generate a lot of messages to stderr concerning the form of the inputand
the resultant non-deterministic and deterministic finite automata.
This option is mostly for use in maintaining flex.
prints the version number to stdout and exits. `--version' is a synonym for`-V'.
instructs flex to generate a 7-bit scanner, i.e., one which can only recognized7-bit characters in its input.
The advantage of using `-7' is that the scanner's tables can be up to half thesize of those generated using the `-8' option (see below).
The disadvantage is that such scanners often hang or crash if their inputcontains an 8-bit character. Note, however, that unless you generate yourscanner using the `-Cf' or `-CF' table compression options,
 use of `-7' will save only a smallamount of table space, and make your scanner considerably less portable. Flex'sdefault behavior is to generate an 8-bit scanner unless you use the `-Cf' or`-CF',
 in which case flex defaults togenerating 7-bit scanners unless your site was always configured to generate8-bit scanners (as will often be the case with non-USA sites).
 You can tell whether flex generated a7-bit or an 8-bit scanner by inspecting the flag summary in the `-v'output
 as described above. Note that if you use`-Cfe' or `-CFe' (those table compression options,
 but also using equivalence classes asdiscussed see below), flex still defaults to generating an 8-bit scanner,
 since usually with these compressionoptions full 8-bit tables are not much more expensive than 7-bit tables.
instructs flex to generate an 8-bit scanner, i.e., one which can recognize8-bit characters.
This flag is only needed for scanners generated using `-Cf' or `-CF',
as otherwise flex defaults to generating an 8-bit scanner anyway.
See the discussion of `-7' above for flex's default behavior and the tradeoffsbetween 7-bit and 8-bit scanners.
specifies that you want flex to generate a C++ scanner class.
See the section on Generating C++ Scanners below for details.
表示你想要产生一个C++扫描器类,细节请看下面的Generating C++ Scanners段.
controls the degree of table compression and, more generally, trade-offsbetween small scanners and fast scanners.
 `-Ca' ("align") instructs flexto trade off larger tables in the generated scanner for faster performancebecause
 the elements of the tables are betteraligned for memory access and computation. On some RISC architectures,
 fetching and manipulating long-words ismore efficient than with smaller-sized units such as shortwords.
 This option can double the size of thetables used by your scanner.
 `-Ce' directs flex to constructequivalence classes, i.e.,
 sets of characters which have identicallexical properties (for example,
 if the only appearance of digits in theflex input is in the character class
 "[0-9]" then the digits '0','1', ..., '9' will all be put in the same equivalence class).
 Equivalence classes usually givedramatic reductions in the final table/object file sizes (typically a factor of2-5)
  and are pretty cheap performance-wise(one array look-up per character scanned).
   `-Cf' specifies that the full scannertables should be generated - flex should not
   compress the tables by takingadvantages of similar transition functions for different states.
   `-CF' specifies that the alternatefast scanner representation (described above under the `-F' flag) should beused.
   This option cannot be used with`-+'.
   `-Cm' directs flex to constructmeta-equivalence classes,
   which are sets of equivalence classes(or characters, if equivalence classes are not being used)
   that are commonly used together.Meta-equivalence classes are often a big win when using compressedtables,
   but they have a moderate performanceimpact (one or two "if" tests and one array look-up per characterscanned).
   '-Cm'指示flex构造 变换-等价类,变换-等价类是通常一起用的等价类的集合(或字符,假如等价类没被用).变换-等价类经常是一个大收益当用压缩表时,
    `-Cr' causes the generated scanner tobypass use of the standard I/O library (stdio) for input.
    Instead of calling `fread()' or`getc()', the scanner will use the `read()' system call,
    resulting in a performance gain whichvaries from system to system,
    but in general is probably negligibleunless you are also using `-Cf' or `-CF'.
    Using `-Cr' can cause strangebehavior if, for example,
    you read from yyin using stdio priorto calling the scanner
    (because the scanner will misswhatever text your previous reads left in the stdio input buffer).
    `-Cr' has no effect if you defineYY_INPUT (see The Generated Scanner above).
    A lone `-C' specifies that thescanner tables should be compressed
    but neither equivalence classes normeta-equivalence classes should be used.
    The options `-Cf' or `-CF' and `-Cm'do not make sense together - there is no opportunity for meta-equivalenceclasses
    选项'-Cf'或'-CF'和选项'-Cm'在一起不合理 -不存在机会对于转换等价类
    if the table is not being compressed.Otherwise the options may be freely mixed, and are cumulative.
    The default setting is `-Cem', whichspecifies that flex should generate equivalence classes and meta-equivalenceclasses.
    This setting provides the highestdegree of table compression.
    You can trade off faster-executingscanners at the cost of larger tables with the following generally beingtrue:
slowest & smallest
fastest & largest

 Note that scanners with the smallesttables are usually generated and compiled the quickest,
 so during development you will usuallywant to use the default, maximal compression.
 `-Cfe' is often a good compromisebetween speed and size for production scanners.
directs flex to write the scanner to the file `out-' put instead of `lex.yy.c'.
If you combine `-o' with the `-t' option, then the scanner is written to stdoutbut its `#line' directives (see the `-L' option above) refer to the fileoutput.
假如你联合‘-o’和’-t’选项,那么扫描器被写到stdout但它的‘#line’指令(看上面‘-L’选项)refer to the file output.
changes the default `yy' prefix used by flex for all globally-visible variableand function names to instead be prefix. For example, `-Pfoo' changes the nameof yytext to `footext'. It also changes the name of the default output filefrom `lex.yy.c' to `'. Here are all of the names affected:

 (If you are using a C++ scanner, thenonly yywrap and yyFlexLexer are affected.) Within your scanner itself, you canstill refer to the global variables and functions using either version of theirname; but externally, they have the modified name. This option lets you easilylink together multiple flex programs into the same executable. Note, though,that using this option also renames `yywrap()', so you now must either provideyour own (appropriately-named) version of the routine for your scanner, or use`%option noyywrap', as linking with `-lfl' no longer provides one for you bydefault.
(假如你用的是一个C++扫描器,那么只有yywrap和yyFlexLexer受影响)在你的扫描器内部,你仍能用他们名字版本提到全局变量和函数;但扩充的是,他们有了修正的名字。这个选项让你很容易的把多个flex程序联接为同一个可执行文件。注意,然而用这个选项的**也重命名为‘yywrap()’,因此你现在必须为你的扫描器提供你自己的(适当命名的)例程,或用‘%option noyywrap’,同样用‘-lf1’选项连接不再为你提供默认的例程。
overrides the default skeleton file from which flex constructs its scanners.You'll never need this option unless you are doing flex maintenance ordevelopment.
不顾默认的基干文件from which flexconstructs its scanners,你将不会需要该选项除非你维护
flex also provides a mechanism for controlling options within the scannerspecification itself, rather than from the flex command-line. This is done byincluding `%option' directives in the first section of the scannerspecification. You can specify multiple options with a single `%option'directive, and multiple directives in the first section of your flex inputfile. Most options are given simply as names, optionally preceded by the word"no" (with no intervening whitespace) to negate their meaning. Anumber are equivalent to flex flags or their negation:
7bit            -7 option
8bit            -8 option
align           -Ca option
backup          -b option
batch           -B option
c++             -+ option

caseful or
case-sensitive  opposite of -i(default)

case-insensitive or
caseless        -i option

debug           -d option
default         opposite of -soption
ecs             -Ce option
fast            -F option
full            -f option
interactive     -I option
lex-compat      -l option
meta-ecs        -Cm option
perf-report     -p option
read            -Cr option
stdout          -t option
verbose         -v option
warn            opposite of -woption
                (use "%optionnowarn" for -w)

array           equivalent to"%array"
pointer         equivalent to"%pointer" (default)

 Some `%option's' provide featuresotherwise not available:
instructs flex to generate a scanner which always considers its input"interactive". Normally, on each new input file the scanner calls`isatty()' in an attempt to determine whether the scanner's input source isinteractive and thus should be read a character at a time. When this option isused, however, then no such call is made.
directs flex to provide a default `main()' program for the scanner, whichsimply calls `yylex()'. This option implies noyywrap (see below).
instructs flex to generate a scanner which never considers its input"interactive" (again, no call made to `isatty())'. This is theopposite of `always-' interactive.
enables the use of start condition stacks (see Start Conditions above).
使能开始条件栈(看上面Start Conditions)。
if unset (i.e., `%option nostdinit') initializes yyin and yyout to nil FILEpointers, instead of stdin and stdout.
假如unset(也就是,%option nostdinit‘)初始化yyin和yyout到nit FILE指针,而不是stdin和stdout。
directs flex to generate a scanner that maintains the number of the currentline read from its input in the global variable yylineno. This option isimplied by `%option lex-compat'.
if unset (i.e., `%option noyywrap'), makes the scanner not call `yywrap()' uponan end-of-file, but simply assume that there are no more files to scan (untilthe user points yyin at a new file and calls `yylex()' again).
flex scans your rule actions to determine whether you use the REJECT or`yymore()' features. The reject and yymore options are available to overrideits decision as to whether you use the options, either by setting them (e.g.,`%option reject') to indicate the feature is indeed used, or unsetting them toindicate it actually is not used (e.g., `%option noyymore').

Three options take string-delimited values, offset with '=':
三个选项获得 字符串-定界 值,偏移用‘=’:
%option outfile="ABC"

 is equivalent to `-oABC', and

%option prefix="XYZ"

 is equivalent to `-PXYZ'.


%option yyclass="foo"

 only applies when generating a C++scanner (`-+' option). It informs flex that you have derived `foo' as asubclass of yyFlexLexer so flex will place your actions in the member function`foo::yylex()' instead of `yyFlexLexer::yylex()'. It also generates a`yyFlexLexer::yylex()' member function that emits a run-time error (by invoking`yyFlexLexer::LexerError()') if called. See Generating C++ Scanners, below, foradditional information.

A number of options are available for lint purists who want to suppress theappearance of unneeded routines in the generated scanner. Each of thefollowing, if unset, results in the corresponding routine not appearing in thegenerated scanner:
许多选项为lint purists可用,lint purists是想要产生的扫描器中禁止不需要的例程出现的人。
input, unput
yy_push_state, yy_pop_state, yy_top_state
yy_scan_buffer, yy_scan_bytes, yy_scan_string

 (though `yy_push_state()' and friendswon't appear anyway unless you use `%option stack').
(虽然‘yy_push_state()’和友元不会出现除非你用`%option stack')。

Performance considerations性能考虑

 The main design goal of flex is that itgenerate high-performance scanners. It has been optimized for dealing well withlarge sets of rules. Aside from the effects on scanner speed of the tablecompression `-C' options outlined above, there are a number of options/actionswhich degrade performance. These are, from most expensive to least:
%option yylineno
arbitrary trailing context

pattern sets that require backing up
%option interactive
%option always-interactive

'^' beginning-of-line operator

 with the first three all being quiteexpensive and the last two being quite cheap. Note also that `unput()' isimplemented as a routine call that potentially does quite a bit of work, while`yyless()' is a quite-cheap macro; so if just putting back some excess text youscanned, use `yyless()'.
开始的3个都是非常昂贵且最后两个非常便宜。也要注意`unput()'作为一个(潜在的要做相当多的小工作的)例程调用实现,虽然‘yyless()‘是非常便宜的宏;如果只是putting back你扫描的一些额外的文本,那还是应该用’yyless()‘。

REJECT should be avoided at all costs when performance is important. It is aparticularly expensive option.

Getting rid of backing up is messy and often may be an enormous amount of workfor a complicated scanner. In principal, one begins by using the `-b' flag togenerate a `lex.backup' file. For example, on the input
foo        return TOK_KEYWORD;
foobar     return TOK_KEYWORD;

 the file looks like:

State #6 is non-accepting -
 associated rule line numbers:
       2       3
 out-transitions: [ o ]
 jam-transitions: EOF [ \001-n  p-\177 ]

State #8 is non-accepting -
 associated rule line numbers:
 out-transitions: [ a ]
 jam-transitions: EOF [ \001-`  b-\177 ]

State #9 is non-accepting -
 associated rule line numbers:
 out-transitions: [ r ]
 jam-transitions: EOF [ \001-q  s-\177 ]

Compressed tables always back up.

 The first few lines tell us that there'sa scanner state in which it can make a
 transition on an 'o' but not on anyother character, and that in that state the
 currently scanned text does not matchany rule. The state occurs when trying to
 match the rules found at lines 2 and 3in the input file. If the scanner is in
 that state and then reads somethingother than an 'o', it will have to back up
 to find a rule which is matched. With abit of head-scratching one can see that
 this must be the state it's in when ithas seen "fo". When this has happened,
 if anything other than another 'o' isseen, the scanner will have to back up to
 simply match the 'f' (by the defaultrule).
With a bit of head-scratching one can see that this must be the state it's inwhen it has seen "fo".
The comment regarding State #8 indicates there's a problem when "foob"has been scanned.
Indeed, on any character other than an 'a', the scanner will have to back up toaccept "foo".
Similarly, the comment for State #9 concerns when "fooba" has beenscanned and an 'r' does not follow.
关于State #8的注释指示当"foob"被扫描到时存在一个问题.的确,基于任何不是1个'a'之上的字符将不得不备份以接受"foo".
类似地,关于State #9的注释关心当"fooba"被扫描到时1个'r'不允许.

The final comment reminds us that there's no point going to all the troubleof
removing backing up from the rules unless we're using `-Cf' or `-CF',
since there's no performance gain doing so with compressed scanners.
最后的注释提醒我们从规则段去除备份不会存在陷入麻烦的地方除非我们用了`-Cf' 或 `-CF',因带压缩表那样做没有效率.

The way to remove the backing up is to add "error" rules:

foo         return TOK_KEYWORD;
foobar      return TOK_KEYWORD;

fooba       |
foob        |
fo          {
            /* false alarm, not really akeyword */
            return TOK_ID;

 Eliminating backing up among a list ofkeywords can also be done using a "catch-all" rule:
foo         return TOK_KEYWORD;
foobar      return TOK_KEYWORD;

[a-z]+      return TOK_ID;

 This is usually the best solution whenappropriate.

Backing up  tend to cascade. With acomplicated set of rules it's not uncommon to
get hundreds of messages. If one can decipher them, though, it often onlytakes
a dozen or so rules to eliminate the backing up (though it's easy to make amistake
and have an error rule accidentally match a valid token.
A possible future flex feature will be to automatically add rules to eliminatebacking up).
It's important to keep in mind that you gain the benefits of eliminatingbacking up
only if you eliminate every instance of backing up. Leaving just one means yougain nothing.

Variable trailing context (where both the leading and trailing parts do nothave a fixed length)
entails almost the same performance loss as REJECT (i.e., substantial). So whenpossible a rule like:
不定的紧随上下文( 在领导者和紧随者两个部分都没有确定长度的地方)蒙受和REJECT同样的效率损失(也就是,本质上).
mouse|rat/(cat|dog)   run();

 is better written:

mouse/cat|dog         run();
rat/cat|dog           run();

 or as

mouse|rat/cat         run();
mouse|rat/dog         run();

 Note that here the special '|' actiondoes not provide any savings, and can even
 make things worse (see Deficiencies /Bugs below).
注意这儿特别的'|'动作不提供保留,并且甚至使事情更恶劣(看下面Deficiencies /Bugs)

Another area where the user can increase a scanner's performance (and onethat's
easier to implement) arises from the fact that the longer the tokens matched,the
faster the scanner will run. This is because with long tokens the processing ofmost
input characters takes place in the (short) inner scanning loop, and does notoften
have to go through the additional work of setting up the scanning environment(e.g., yytext) for the action.
Recall the scanner for C comments:
%x comment
        int line_num = 1;

"/*"        BEGIN(comment);

<comment>\n            ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);

 This could be sped up by writing itas:
%x comment
        int line_num = 1;

"/*"        BEGIN(comment);

<comment>[^*\n]*\n     ++line_num;
<comment>"*"+[^*/\n]*\n ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);

 Now instead of each newline requiringthe processing of another action, recognizing
 the newlines is "distributed"over the other rules to keep the matched text as long as possible.
 Note that adding rules does not slowdown the scanner!
 The speed of the scanner is independentof the number of rules or
 (modulo the considerations given at thebeginning of this section)
 how complicated the rules are withregard to operators such as '*' and '|'.

A final example in speeding up a scanner: suppose you want to scan through afile containing
identifiers and keywords, one per line and with no other extraneouscharacters,
and recognize all the keywords. A natural first approach is:
asm      |
auto     |
break    |
... etc ...
volatile |
while    /* it's a keyword */

.|\n     /* it's not a keyword */

 To eliminate the back-tracking,introduce a catch-all rule:
asm      |
auto     |
break    |
... etc ...
volatile |
while    /* it's a keyword */

[a-z]+   |
.|\n     /* it's not a keyword */

 Now, if it's guaranteed that there's exactlyone word per line, then we can reduce the total number
 of matches by a half by merging in therecognition of newlines with that of the other tokens:
asm\n    |
auto\n   |
break\n  |
... etc ...
volatile\n |
while\n  /* it's a keyword */

[a-z]+\n |
.|\n     /* it's not a keyword */

 One has to be careful here, as we havenow reintroduced backing up into the scanner.
 In particular, while we know that therewill never be any characters in the input stream
 other than letters or newlines, flexcan't figure this out, and it will plan for possibly
 needing to back up when it has scanned atoken like "auto" and then the next character is something
 other than a newline or a letter.Previously it would then just match the "auto" rule and be done,
 but now it has no "auto" rule,only a "auto\n" rule. To eliminate the possibility of backing up,
 we could either duplicate all rules butwithout final newlines, or,
 since we never expect to encounter suchan input and therefore don't how it's classified,
 we can introduce one more catch-allrule, this one which doesn't include a newline:
asm\n    |
auto\n   |
break\n  |
... etc ...
volatile\n |
while\n  /* it's a keyword */

[a-z]+\n |
[a-z]+   |
.|\n     /* it's not a keyword */

 Compiled with `-Cf', this is about asfast as one can get a flex scanner to go for this particular problem.
A final note: flex is slow when matching NUL's, particularly when a tokencontains multiple NUL's.
It's best to write rules which match short amounts of text if it's anticipatedthat the text will often
include NUL's.


另一个决定性的注解是关于效率的:正如How the Input isMatched节提到的,动态地改变yytext以适应处理巨大的记号的需要也是
Another final note regarding performance: as mentioned above in the section Howthe Input is Matched,
dynamically resizing yytext to accommodate huge tokens is a slow processbecause it presently requires
that the (huge) token be rescanned from the beginning. Thus if performance isvital,
you should attempt to match "large" quantities of text but not"huge" quantities,
where the cutoff between the two is at about 8K characters/token.

Generating C++ scanners 产生C++扫描器

flex provides two different ways to generate scanners for use with C++.

The first way is to simply compile a scanner generated by flex using a C++compiler instead of a C compiler.
You should not encounter any compilations errors
(please report any you find to the email address given in the Author sectionbelow).
You can then use C++ code in your rule actions instead of C code.
Note that the default input source for your scanner remains yyin,
and default echoing is still done to yyout. Both of these remain `FILE *'variables and not C++ streams.
注意对你的扫描器默认的输入源仍然是yyin,并且默认回音仍是yyout.yyin和yyout两者仍然是`FILE *'变量而不是C++流.

You can also use flex to generate a C++ scanner class, using the `-+' option,
(or, equivalently, `%option c++'), which is automatically specified if the nameof
the flex executable ends in a `+', such as flex++. When using this option,
flex defaults to generating the scanner to the file `' instead of`lex.yy.c'.
The generated scanner includes the header file `FlexLexer.h',
which defines the interface to two C++ classes.
(或者,相当于,`%option c++'),

The first class, FlexLexer, provides an abstract base class defining thegeneral scanner class interface.
It provides the following member functions:

`const char* YYText()'
returns the text of the most recently matched token, the equivalent ofyytext.

`int YYLeng()'
returns the length of the most recently matched token, the equivalent ofyyleng.

`int lineno() const'
returns the current input line number (see `%option yylineno'), or 1 if`%option yylineno' was not used.
返回当前输入行数(看`%option yylineno'),或1假如`%option yylineno'没被用

`void set_debug( int flag )'
sets the debugging flag for the scanner, equivalent to assigning toyy_flex_debug
(see the Options section above).
Note that you must build the scanner using `%option debug' to include debugginginformation in it.
为扫描器设置调试标志,等价于赋值给yy_flex_debug(看上面的the Options section).
注意你必须用`%option debug'建造扫描器以包含进调试信息在扫描器内.

`int debug() const'
returns the current setting of the debugging flag.
Also provided are member functions equivalent to `yy_switch_to_buffer(),yy_create_buffer()'
(though the first argument is an `istream*' object pointer and not a `FILE*',`yy_flush_buffer()',
`yy_delete_buffer()', and `yyrestart()' (again, the first argument is a`istream*' object pointer).

The second class defined in `FlexLexer.h' is yyFlexLexer, which is derived fromFlexLexer.
It defines the following additional member functions:

`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
constructs a yyFlexLexer object using the given streams for input and output.
If not specified, the streams default to cin and cout, respectively.

`virtual int yylex()'
performs the same role is `yylex()' does for ordinary flex scanners: it scansthe input stream,
consuming tokens, until a rule's action returns a value. If you derive asubclass S from yyFlexLexer
and want to access the member functions and variables of S inside `yylex()',
then you need to use `%option yyclass="S"' to inform flex that youwill be using that subclass
instead of yyFlexLexer. In this case, rather than generating`yyFlexLexer::yylex()',
flex generates `S::yylex()' (and also generates a dummy `yyFlexLexer::yylex()'
that calls `yyFlexLexer::LexerError()' if called).

`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
reassigns yyin to new_in (if non-nil) and yyout to new_out (ditto),
deleting the previous input buffer if yyin is reassigned.

`int yylex( istream* new_in = 0, ostream* new_out = 0 )'
first switches the input streams via `switch_streams( new_in, new_out )' and
then returns the value of `yylex()'.
首先经由`switch_streams(new_in, new_out )'转换输入流而后返回yylex()的值.

In addition, yyFlexLexer defines the following protected virtual functionswhich you can redefine in derived classes to tailor the scanner:

`virtual int LexerInput( char* buf, int max_size )'
reads up to `max_size' characters into buf and returns the number of charactersread.
To indicate end-of-input, return 0 characters. Note that"interactive" scanners
(see the `-B' and `-I' flags) define the macro YY_INTERACTIVE.
If you redefine LexerInput() and need to take different actions depending onwhether or not
the scanner might be scanning an interactive input source, you can test for thepresence
of this name via `#ifdef'.

`virtual void LexerOutput( const char* buf, int size )'
writes out size characters from the buffer buf, which, whileNUL-terminated,
may also contain "internal" NUL's if the scanner's rules can matchtext with NUL's in them.
从缓冲区buf写出size个字符,which, while NUL-terminated,或许包含"internal内在的"NUL,

`virtual void LexerError( const char* msg )'
reports a fatal error message. The default version of this function writes themessage to the stream cerr and exits.
Note that a yyFlexLexer object contains its entire scanning state. Thus you canuse such objects to create reentrant scanners. You can instantiate multipleinstances of the same yyFlexLexer class, and you can also combine multiple C++scanner classes together in the same program using the `-P' option discussedabove. Finally, note that the `%array' feature is not available to C++ scannerclasses; you must use `%pointer' (the default).

Here is an example of a simple C++ scanner: 这儿是一个简单的C++扫描器的例子

    // An example of using the flex C++scanner class.一个用flexC++扫描器类的例子

int mylineno = 0;

string  \"[^\n"]+\"

ws      [ \t]+

alpha   [A-Za-z]
dig     [0-9]
name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
num1   [-+]?{dig}+\.?([eE][-+]?{dig}+)?
num2   [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
number  {num1}|{num2}


{ws}    /* skip blanks and tabs */

"/*"    {
        int c;

        while((c = yyinput()) != 0)
            if(c == '\n')

            else if(c == '*')
                if((c = yyinput()) =='/')

{number}  cout << "number" << YYText() << '\n';

\n        mylineno++;

{name}    cout << "name "<< YYText() << '\n';

{string}  cout << "string" << YYText() << '\n';


Version 2.5               December1994                        44

int main( int /* argc */, char** /* argv */ )
    FlexLexer* lexer = newyyFlexLexer;
    while(lexer->yylex() != 0)
    return 0;

 If you want to create multiple(different) lexer classes,
 you use the `-P' flag (or the `prefix='option) to rename each yyFlexLexer to some other xxFlexLexer.
 You then can include`<FlexLexer.h>' in your other sources once per lexer class,
 first renaming yyFlexLexer as follows:
#undef yyFlexLexer
#define yyFlexLexer xxFlexLexer
#include <FlexLexer.h>

#undef yyFlexLexer
#define yyFlexLexer zzFlexLexer
#include <FlexLexer.h>

 if, for example, you used `%optionprefix="xx"' for one of your scanners and `%optionprefix="zz"' for the other.
 举个例子,假如,你用`%optionprefix="xx"'处理你的一个扫描器而用`%option prefix="zz"'处理另一个.

IMPORTANT: the present form of the scanning class is experimental and maychange considerably between major releases.

Incompatibilities with lex and POSIX

flex is a rewrite of the AT&T Unix lex tool (the two implementations do notshare any code, though),
with some extensions and incompatibilities, both of which are of concern tothose who wish to write scanners
acceptable to either implementation. Flex is fully compliant with the POSIX lexspecification,
except that when using `%pointer' (the default), a call to `unput()' destroysthe contents of yytext,
which is counter to the POSIX specification.
flex是一个AT&T Unix的lex工具的重写品(虽然,两者的实现没有任何共有代码),带有一些扩展和不兼容性,扩展和不兼容性是

In this section we discuss all of the known areas of incompatibility betweenflex, AT&T lex,
and the POSIX specification.
这一节我们讨论所有知道的在flex,AT&T lex,和POSIX规定间的不兼容性.

flex's `-l' option turns on maximum compatibility with the original AT&Tlex implementation,
at the cost of a major loss in the generated scanner's performance.
We note below which incompatibilities can be overcome using the `-l' option.
flex的'-l'选项开启和最初AT&T lex实现的最大兼容,代价是在产生的扫描器效率的主要损失.

flex is fully compatible with lex with the following exceptions:

The undocumented lex scanner internal variable yylineno is not supported unless`-l' or `%option yylineno' is used.
yylineno should be maintained on a per-buffer basis, rather than a per-scanner(single global variable) basis.
yylineno is not part of the POSIX specification.
未公开的lex扫描器内部变量不支持除非用了`-l'或`%option yylineno'.

The `input()' routine is not redefinable, though it may be called to readcharacters following whatever
has been matched by a rule. If `input()' encounters an end-of-file the normal`yywrap()' processing is done.
A "real" end-of-file is returned by `input()' as EOF. Input isinstead controlled by defining the YY_INPUT macro.
The flex restriction that `input()' cannot be redefined is in accordance withthe POSIX specification,
which simply does not specify any way of controlling the scanner's input otherthan by making an initial
assignment to yyin.

The `unput()' routine is not redefinable. This restriction is in accordancewith POSIX.
flex scanners are not as reentrant as lex scanners. In particular, if you havean interactive scanner
and an interrupt handler which long-jumps out of the scanner, and the scanneris subsequently called again,
you may get the following message:
fatal flex scanner internal error--end of buffer missed
fatal flex scanner internal error--end of buffer missed

 To reenter the scanner, first use 要重新进入扫描器,首先用
yyrestart( yyin );

 Note that this call will throw away anybuffered input; usually this isn't a problem with an interactive scanner.
 Also note that flex C++ scanner classesare reentrant, so if using C++ is an option for you,
 you should use them instead.  See "Generating C++ Scanners" abovefor details.
`output()' is not supported. Output from the `ECHO' macro is done to thefile-pointer yyout (default stdout).
`output()' is not part of the POSIX specification.
lex does not support exclusive start conditions (%x), though they are in thePOSIX specification.
When definitions are expanded, flex encloses them in parentheses. With lex, thefollowing:
细节看上面的"GeneratingC++ Scanners".
NAME    [A-Z][A-Z0-9]*
foo{NAME}?      printf( "Foundit\n" );

 will not match the string"foo" because when the macro is expanded the rule is
 equivalent to"foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' isassociated with "[A-Z0-9]*".
 With flex, the rule will be expanded to"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
 Note that if the definition begins with`^' or ends with `$' then it is not expanded with parentheses,
 to allow these operators to appear indefinitions without losing their special meanings. But the `<s>, /',
 and `<<EOF>>' operatorscannot be used in a flex definition. Using `-l' results in the lex behavior of
 no parentheses around the definition.The POSIX specification is that the definition be enclosed in parentheses.
Some implementations of lex allow a rule's action to begin on a separate line,if the rule's pattern
has trailing whitespace:
它不会带括号扩展,以允许这些operators出现在定义而又不失去它们专门的意义.但是`<s>, /',和`<<EOF>>'不能用在flex的
foo|bar<space here>
  { foobar_action(); }

flex does not support this feature.
The lex `%r' (generate a Ratfor scanner) option is not supported. It is notpart of the POSIX specification.
After a call to `unput()', yytext is undefined until the next token is matched,unless the scanner was built
using `%array'. This is not the case with lex or the POSIX specification. The`-l' option does away with this incompatibility.
The precedence of the `{}' (numeric range) operator is different. lexinterprets "abc{1,3}" as "match one,
two, or three occurrences of 'abc'", whereas flex interprets it as"match 'ab' followed by one, two,
or three occurrences of 'c'". The latter is in agreement with the POSIXspecification.
The precedence of the `^' operator is different. lex interprets"^foo|bar" as "match either 'foo'
at the beginning of a line, or 'bar' anywhere", whereas flex interprets itas "match either 'foo' or 'bar'
if they come at the beginning of a line". The latter is in agreement withthe POSIX specification.
The special table-size declarations such as `%a' supported by lex are notrequired by flex scanners;
flex ignores them.
The name FLEX_SCANNER is #define'd so scanners may be written for use witheither flex or lex.
Scanners also include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSIONindicating which version of flex
generated the scanner (for example, for the 2.5 release, these defines would be2 and 5 respectively).





The following flex features are not included in lex or the POSIX specification:
C++ scanners
start condition scopes
start condition stacks
interactive/non-interactive scanners
yy_scan_string() and friends
#line directives
%{}'s around actions
multiple actions on a line

 plus almost all of the flex flags. Thelast feature in the list refers to the fact
 that with flex you can put multipleactions on the same line, separated with semicolons,
 while with lex, the following
foo    handle_foo();++num_foos_seen;

 is (rather surprisingly) truncated to
foo    handle_foo();

flex does not truncate the action. Actions that are not enclosed in braces aresimply terminated at the end of the line.


`warning, rule cannot be matched'
indicates that the given rule cannot be matched because it follows other rulesthat will always match the same text as it.
For example, in the following "foo" cannot be matched because itcomes after an identifier "catch-all" rule:
`warning, rule cannot be matched'
[a-z]+    got_identifier();
foo       got_foo();

 Using REJECT in a scanner suppressesthis warning. 在扫描器用REJECT禁止这个警告.
`warning, -s option given but default rule can be matched' ,警告,给定的-s选项但默认的规则能被匹配,
means that it is possible (perhaps only in a particular start condition)that
the default rule (match any single character) is the only one that will match aparticular input.
Since `-s' was given, presumably this is not intended.

`reject_used_but_not_detected undefined'
`yymore_used_but_not_detected undefined'
These errors can occur at compile time. They indicate that the scanner usesREJECT or `yymore()' but
that flex failed to notice the fact, meaning that flex scanned the first twosections looking for
occurrences of these actions and failed to find any, but somehow you snuck somein (via a #include file,
for example). Use `%option reject' or `%option yymore' to indicate to flex thatyou really do use these features.
用'%option reject'或`%option yymore'以告诉flex你实际上用了这些特色.

`flex scanner jammed'
a scanner compiled with `-s' has encountered an input string which wasn'tmatched by any of its rules.
This error can also occur due to internal problems.

`token too large, exceeds YYLMAX'
your scanner uses `%array' and one of its rules matched a string longer thanthe `YYL-' MAX constant
(8K bytes by default). You can increase the value by #define'ing YYLMAX in thedefinitions section of
your flex input.
在你的flex输入的定义段中#define YYLMAX来增加该值.

`scanner requires -8 flag to use the character 'x''
Your scanner specification includes recognizing the 8-bit character x and youdid not specify the -8 flag,
and your scanner defaulted to 7-bit because you used the `-Cf' or `-CF' tablecompression options.
See the discussion of the `-7' flag for details.

`flex scanner push-back overflow'
you used `unput()' to push back so much text that the scanner's buffer couldnot hold both the pushed-back
text and the current token in yytext. Ideally the scanner should dynamicallyresize the buffer in this case,
but at present it does not.

`input buffer overflow, can't enlarge buffer because scanner uses REJECT'
the scanner was working on matching an extremely large token and needed toexpand the input buffer.
This doesn't work with scanners that use REJECT.

`fatal flex scanner internal error--end of buffer missed'
This can occur in an scanner which is reentered after a long-jump has jumpedout (or over) the scanner's
activation frame. Before reentering the scanner, use:
yyrestart( yyin );

 or, as noted above, switch to using theC++ scanner class.
`too many start conditions in <> construct!'
you listed more start conditions in a <> construct than exist (so youmust have listed at least
one of them twice).


library with which scanners must be linked.

generated scanner (called `lexyy.c' on some systems).

generated C++ scanner class, when using `-+'.

header file defining the C++ scanner base class, FlexLexer, and its derivedclass, yyFlexLexer.

skeleton scanner. This file is only used when building flex, not when flexexecutes.

backing-up information for `-b' flag (called `lex.bck' on some systems).

Deficiencies / Bugs

 Some trailing context patterns cannot beproperly matched and generate warning
 messages ("dangerous trailingcontext"). These are patterns where the ending of the first
 part of the rule matches the beginningof the second part, such as "zx*/xy*",
 where the 'x*' matches the 'x' at thebeginning of the trailing context.
 (Note that the POSIX draft states thatthe text matched by such patterns is undefined.)
一些紧随上下文模式不能适当的匹配和产生警告消息("dangerous trailing context").

For some trailing context rules, parts which are actually fixed-length are notrecognized as such,
leading to the abovementioned performance loss. In particular, parts using '|'or {n} (such as "foo{3}")
are always considered variable-length.
'|' 或 {n} (象"foo{3}")的部分也总是考虑成变长的.

Combining trailing context with the special '|' action can result in fixedtrailing context being
turned into the more expensive variable trailing context. For example, in thefollowing:

abc      |

 Use of `unput()' invalidates yytext andyyleng, unless the `%array' directive
 or the `-l' option has been used.

Pattern-matching of NUL's is substantially slower than matching othercharacters.

Dynamic resizing of the input buffer is slow, as it entails rescanning all thetext matched so far
by the current (generally huge) token.

Due to both buffering of input and read-ahead, you cannot intermix calls to<stdio.h> routines,
such as, for example, `getchar()', with flex rules and expect it to work. Call`input()' instead.

The total table entries listed by the `-v' flag excludes the number of tableentries
needed to determine what rule has been matched.The number of entries is equalto the number of DFA states
if the scanner does not use REJECT, and somewhat greater than the number ofstates if it does.
整个的表入口列表 通过'-v'标志排除"需要决定什么规则已被匹配的"表入口的数目.

REJECT cannot be used with the `-f' or `-F' options.

The flex internal algorithms need documentation.

See also

lex(1), yacc(1), sed(1), awk(1).

John Levine, Tony Mason, and Doug Brown: Lex & Yacc; O'Reilly andAssociates. Be sure to get the 2nd edition.

M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.

Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: Principles, Techniquesand Tools; Addison-Wesley (1986). Describes the pattern-matching techniquesused by flex (deterministic finite automata).


 Vern Paxson, with the help of many ideasand much inspiration from Van Jacobson. Original version by Jef Poskanzer. Thefast table representation is a partial implementation of a design done by VanJacobson. The implementation was done by Kevin Gong and Vern Paxson.

Thanks to the many flex beta-testers, feedbackers, and contributors, especiallyFrancois Pinard, Casey Leedom, Stan Adermann, Terry Allen, DavidBarker-Plummer, John Basrai, Nelson H.F. Beebe, `', Karl Berry,Peter A. Bigot, Simon Blanchard, Keith Bostic, Frederic Brehm, Ian Brockbank,Kin Cho, Nick Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin, BillCox, Nick Cropper, Dave Curtis, Scott David Daniels, Chris G. Demetriou, TheoDeraadt, Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, Chris Faylor,Chris Flatters, Jon Forrest, Joe Gayda, Kaveh R. Ghazi, Eric Goldman,Christopher M. Gould, Ulrich Grepel, Peer Griebel, Jan Hajic, Charles Hemphill,NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig, Dana Hudes, EricHughes, John Interrante, Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara,Jeffrey R. Jones, Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence OKane, Amir Katz, `', Kevin B. Kenny, Steve Kirsch, WinfriedKoenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, Craig Leres, JohnLevine, Steve Liddle, Mike Long, Mohamed el Lozy, Brian Madsen, Malte, JoeMarshall, Bengt Martensson, Chris Metcalf, Luke Mewburn, Jim Meyering, R.Alexander Milowski, Erik Naggum, G.T. Nicol, Landon Noll, James Nordby, MarcNozell, Richard Ohnemus, Karsten Pahnke, Sven Panne, Roland Pesch, WalterPelissero, Gaumond Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,Frederic Raimbault, Pat Rankin, Rick Richardson, Kevin Rodgers, Kai Uwe Rommel,Jim Roskind, Alberto Santini, Andreas Scherer, Darrell Schiebel, RafSchietekat, Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex Siegel,Eckehard Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, IanLance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul Tuinenga, GaryWeik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken Yap, Ron Zellar,Nathan Zelle, David Zuhn, and those whose names have slipped my marginalmail-archiving skills but whose contributions are appreciated all the same.

Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore, Craig Leres,John Levine, Bob Mulcahy, G.T. Nicol, Francois Pinard, Rich Salz, and Richard Stallmanfor help with various distribution headaches.

Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to BensonMargulies and Fred Burke for C++ support; to Kent Williams and Tom Epperly forC++ class support; to Ove Ewerlid for support of NUL's; and to Eric Hughes forsupport of multiple buffers.

This work was primarily done when I was with the Real Time Systems Group at theLawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there for thesupport I received.

Send comments to `'.

This document was generated on 23 February 2001 using texi2html??1.56k.






