6.4 Lexical elements

最新推荐文章于 2024-04-12 14:11:34 发布

cassper

最新推荐文章于 2024-04-12 14:11:34 发布

阅读量888

点赞数

分类专栏： C语言学习文章标签： character constants integer translation literals string

C语言学习专栏收录该内容

20 篇文章 0 订阅

订阅专栏

Syntax
1 token:
keyword
identifier
constant
string-literal
punctuator
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the above
Constraints
2 Each preprocessing token that is converted to a token shall have the lexical form of a
keyword, an identifier, a constant, a string literal, or a punctuator.
Semantics
3 Atoken is the minimal lexical element of the language in translation phases 7 and 8. The
categories of tokens are: keywords, identifiers, constants, string literals, and punctuators.
A preprocessing token is the minimal lexical element of the language in translation
phases 3 through 6. The categories of preprocessing tokens are: header names,
identifiers, preprocessing numbers, character constants, string literals, punctuators, and
single non-white-space characters that do not lexically match the other preprocessing
token categories.58) If a ' or a " character matches the last category, the behavior is
undefined. Preprocessing tokens can be separated by white space; this consists of
comments (described later), or white-space characters (space, horizontal tab, new-line,
vertical tab, and form-feed), or both. As described in 6.10, in certain circumstances
during translation phase 4, white space (or the absence thereof) serves as more than
preprocessing token separation. White space may appear within a preprocessing token
only as part of a header name or between the quotation characters in a character constant
or string literal.

4 If the input stream has been parsed into preprocessing tokens up to a given character, the
next preprocessing token is the longest sequence of characters that could constitute a
preprocessing token. There is one exception to this rule: a header name preprocessing
token is only recognized within a #include preprocessing directive, and within such a
directive, a sequence of characters that could be either a header name or a string literal is
recognized as the former.
5 EXAMPLE 1 The program fragment 1Ex is parsed as a preprocessing number token (one that is not a
valid floating or integer constant token), even though a parse as the pair of preprocessing tokens 1 and Ex
might produce a valid expression (for example, if Ex were a macro defined as +1). Similarly, the program
fragment 1E1 is parsed as a preprocessing number (one that is a valid floating constant token), whether or
not E is a macro name.
6 EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which violates a constraint on
increment operators, even though the parse x ++ + ++ y might yield a correct expression.
Forward references: character constants (6.4.4.4), comments (6.4.9), expressions (6.5),
floating constants (6.4.4.2), header names (6.4.7), macro replacement (6.10.3), postfix
increment and decrement operators (6.5.2.4), prefix increment and decrement operators
(6.5.3.1), preprocessing directives (6.10), preprocessing numbers (6.4.8), string literals
(6.4.5).
6.4.1 Keywords
Syntax
1 keyword: one of
auto
break
case
char
const
continue
default
do
double
else
enum
extern
float
for
goto
if
inline
int
long
register
restrict
return
short
signed
sizeof
static
struct
switch
typedef
union
unsigned
void
volatile
while
_Bool
_Complex
_Imaginary
Semantics
2 The above tokens (case sensitive) are reserved (in translation phases 7 and 8) for use as
keywords, and shall not be used otherwise.

6.4.2 Identifiers
6.4.2.1 General
Syntax
1 identifier:
identifier-nondigit
identifier identifier-nondigit
identifier digit
identifier-nondigit:
nondigit
universal-character-name
other implementation-defined characters
nondigit: one of
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
digit: one of
0 1 2 3 4 5 6 7 8 9
Semantics
2 An identifier is a sequence of nondigit characters (including the underscore _, the
lowercase and uppercase Latin letters, and other characters) and digits, which designates
one or more entities as described in 6.2.1. Lowercase and uppercase letters are distinct.
There is no specific limit on the maximum length of an identifier.
3 Each universal character name in an identifier shall designate a character whose encoding
in ISO/IEC 10646 falls into one of the ranges specified in annex D.59) The initial
character shall not be a universal character name designating a digit. An implementation
may allow multibyte characters that are not part of the basic source character set to
appear in identifiers; which characters and their correspondence to universal character
names is implementation-defined.
4 When preprocessing tokens are converted to tokens during translation phase 7, if a
preprocessing token could be converted to either a keyword or an identifier, it is converted
to a keyword.

Implementation limits
5 As discussed in 5.2.4.1, an implementation may limit the number of significant initial
characters in an identifier; the limit for an external name (an identifier that has external
linkage) may be more restrictive than that for an internal name (a macro name or an
identifier that does not have external linkage). The number of significant characters in an
identifier is implementation-defined.
6 Any identifiers that differ in a significant character are different identifiers. If two
identifiers differ only in nonsignificant characters, the behavior is undefined.
Forward references: universal character names (6.4.3), macro replacement (6.10.3).
6.4.2.2 Predefined identifiers
Semantics
1 The identifier _ _func_ _ shall be implicitly declared by the translator as if,
immediately following the opening brace of each function definition, the declaration
static const char _ _func_ _[] = "function-name";
appeared, where function-name is the name of the lexically-enclosing function.60)
2 This name is encoded as if the implicit declaration had been written in the source
character set and then translated into the execution character set as indicated in translation
phase 5.
3 EXAMPLE Consider the code fragment:
#include <stdio.h>
void myfunc(void)
{
printf("%s/n", _ _func_ _);
/* ... */
}
Each time the function is called, it will print to the standard output stream:
myfunc
Forward references: function definitions (6.9.1).

6.4.3 Universal character names
Syntax
universal-character-name:
/u hex-quad
/U hex-quad hex-quad
hex-quad:
hexadecimal-digit hexadecimal-digit
hexadecimal-digit hexadecimal-digit
Constraints
A universal character name shall not specify a character whose short identifier is less than
00A0 other than 0024 ($), 0040 (@), or 0060 (‘), nor one in the range D800 through
DFFF inclusive.61)
Description
Universal character names may be used in identifiers, character constants, and string
literals to designate characters that are not in the basic character set.
Semantics
The universal character name /Unnnnnnnn designates the character whose eight-digit
short identifier (as specified by ISO/IEC 10646) is nnnnnnnn.62) Similarly, the universal
character name /unnnn designates the character whose four-digit short identifier is nnnn
(and whose eight-digit short identifier is 0000nnnn).

6.4.4 Constants
Syntax
1 constant:
integer-constant
floating-constant
enumeration-constant
character-constant
Constraints
2 The value of a constant shall be in the range of representable values for its type.
Semantics
3 Each constant has a type, determined by its form and value, as detailed later.
6.4.4.1 Integer constants
Syntax
1 integer-constant:
decimal-constant integer-suffixopt
octal-constant integer-suffixopt
hexadecimal-constant integer-suffixopt
decimal-constant:
nonzero-digit
decimal-constant digit
octal-constant:
0
octal-constant octal-digit
hexadecimal-constant:
hexadecimal-prefix hexadecimal-digit
hexadecimal-constant hexadecimal-digit
hexadecimal-prefix: one of
0x 0X
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
octal-digit: one of
0 1 2 3 4 5 6 7

hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffixopt
unsigned-suffix long-long-suffix
long-suffix unsigned-suffixopt
long-long-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
Description
2 An integer constant begins with a digit, but has no period or exponent part. It may have a
prefix that specifies its base and a suffix that specifies its type.
3 A decimal constant begins with a nonzero digit and consists of a sequence of decimal
digits. An octal constant consists of the prefix 0 optionally followed by a sequence of the
digits 0 through 7 only. A hexadecimal constant consists of the prefix 0x or 0X followed
by a sequence of the decimal digits and the letters a (or A) through f (or F) with values
10 through 15 respectively.
Semantics
4 The value of a decimal constant is computed base 10; that of an octal constant, base 8;
that of a hexadecimal constant, base 16. The lexically first digit is the most significant.
5 The type of an integer constant is the first of the corresponding list in which its value can
be represented.

Octal or Hexadecimal
Suffix DecimalConstant Constant
int int
long int unsigned int
long int
unsigned long int
long long int
unsigned long long int
long long int
none
unsigned int unsigned int
unsigned long int unsigned long int
unsigned long long int unsigned long long int
u or U
long int long int
unsigned long int
long long int
unsigned long long int
long long int
l or L
Both u or U unsigned long int unsigned long int
and l or L unsigned long long int unsigned long long int
long long int
unsigned long long int
ll or LL long long int
Both u or U
and ll or LL
unsigned long long int unsigned long long int
If an integer constant cannot be represented by any type in its list, it may have an
extended integer type, if the extended integer type can represent its value. If all of the
types in the list for the constant are signed, the extended integer type shall be signed. If
all of the types in the list for the constant are unsigned, the extended integer type shall be
unsigned. If the list contains both signed and unsigned types, the extended integer type
may be signed or unsigned.

6.4.4.2 Floating constants
Syntax
1 floating-constant:
decimal-floating-constant
hexadecimal-floating-constant
decimal-floating-constant:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
hexadecimal-floating-constant:
hexadecimal-prefix hexadecimal-fractional-constant
binary-exponent-part floating-suffixopt
hexadecimal-prefix hexadecimal-digit-sequence
binary-exponent-part floating-suffixopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence digit
hexadecimal-fractional-constant:
hexadecimal-digit-sequenceopt .
hexadecimal-digit-sequence
hexadecimal-digit-sequence .
binary-exponent-part:
p signopt digit-sequence
P signopt digit-sequence
hexadecimal-digit-sequence:
hexadecimal-digit
hexadecimal-digit-sequence hexadecimal-digit
floating-suffix: one of
f l F L

Description
2 A floating constant has a significand part that may be followed by an exponent part and a
suffix that specifies its type. The components of the significand part may include a digit
sequence representing the whole-number part, followed by a period (.), followed by a
digit sequence representing the fraction part. The components of the exponent part are an
e, E, p, or P followed by an exponent consisting of an optionally signed digit sequence.
Either the whole-number part or the fraction part has to be present; for decimal floating
constants, either the period or the exponent part has to be present.
Semantics
3 The significand part is interpreted as a (decimal or hexadecimal) rational number; the
digit sequence in the exponent part is interpreted as a decimal integer. For decimal
floating constants, the exponent indicates the power of 10 by which the significand part is
to be scaled. For hexadecimal floating constants, the exponent indicates the power of 2
by which the significand part is to be scaled. For decimal floating constants, and also for
hexadecimal floating constants when FLT_RADIX is not a power of 2, the result is either
the nearest representable value, or the larger or smaller representable value immediately
adjacent to the nearest representable value, chosen in an implementation-defined manner.
For hexadecimal floating constants when FLT_RADIX is a power of 2, the result is
correctly rounded.
4 An unsuffixed floating constant has type double. If suffixed by the letter f or F, it has
type float. If suffixed by the letter l or L, it has type long double.
5 Floating constants are converted to internal format as if at translation-time. The
conversion of a floating constant shall not raise an exceptional condition or a floatingpoint
exception at execution time.
Recommended practice
6 The implementation should produce a diagnostic message if a hexadecimal constant
cannot be represented exactly in its evaluation format; the implementation should then
proceed with the translation of the program.
7 The translation-time conversion of floating constants should match the execution-time
conversion of character strings by library functions, such as strtod, giv en matching
inputs suitable for both conversions, the same result format, and default execution-time
rounding.63)

6.4.4.3 Enumeration constants
Syntax
1 enumeration-constant:
identifier
Semantics
2 An identifier declared as an enumeration constant has type int.
Forward references: enumeration specifiers (6.7.2.2).
6.4.4.4 Character constants
Syntax
1 character-constant:
' c-char-sequence '
L' c-char-sequence '
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except
the single-quote ', backslash /, or new-line character
escape-sequence
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
universal-character-name
simple-escape-sequence: one of
/' /" /? //
/a /b /f /n /r /t /v
octal-escape-sequence:
/ octal-digit
/ octal-digit octal-digit
/ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
/x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

Description
2 An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'. A wide character constant is the same, except prefixed by the
letter L. With a few exceptions detailed later, the elements of the sequence are any
members of the source character set; they are mapped in an implementation-defined
manner to members of the execution character set.
3 The single-quote ', the double-quote ", the question-mark ?, the backslash /, and
arbitrary integer values are representable according to the following table of escape
sequences:
single quote' /'
double quote" /"
question mark? /?
backslash/ //
octal character /octal digits
hexadecimal character /x hexadecimal digits
4 The double-quote " and question-mark ? are representable either by themselves or by the
escape sequences /" and /?, respectively, but the single-quote ' and the backslash /
shall be represented, respectively, by the escape sequences /' and //.
5 The octal digits that follow the backslash in an octal escape sequence are taken to be part
of the construction of a single character for an integer character constant or of a single
wide character for a wide character constant. The numerical value of the octal integer so
formed specifies the value of the desired character or wide character.
6 The hexadecimal digits that follow the backslash and the letter x in a hexadecimal escape
sequence are taken to be part of the construction of a single character for an integer
character constant or of a single wide character for a wide character constant. The
numerical value of the hexadecimal integer so formed specifies the value of the desired
character or wide character.
7 Each octal or hexadecimal escape sequence is the longest sequence of characters that can
constitute the escape sequence.
8 In addition, characters not in the basic character set are representable by universal
character names and certain nongraphic characters are representable by escape sequences
consisting of the backslash / followed by a lowercase letter: /a, /b, /f, /n, /r, /t,
and /v.64)

Constraints
9 The value of an octal or hexadecimal escape sequence shall be in the range of
representable values for the type unsigned char for an integer character constant, or
the unsigned type corresponding to wchar_t for a wide character constant.
Semantics
10 An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined. If an integer character constant contains
a single character or escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape sequence is converted to
type int.
11 A wide character constant has type wchar_t, an integer type defined in the
<stddef.h> header. The value of a wide character constant containing a single
multibyte character that maps to a member of the extended execution character set is the
wide character corresponding to that multibyte character, as defined by the mbtowc
function, with an implementation-defined current locale. The value of a wide character
constant containing more than one multibyte character, or containing a multibyte
character or escape sequence not represented in the extended execution character set, is
implementation-defined.
12 EXAMPLE 1 The construction '/0' is commonly used to represent the null character.
13 EXAMPLE 2 Consider implementations that use two’s-complement representation for integers and eight
bits for objects that have type char. In an implementation in which type char has the same range of
values as signed char, the integer character constant '/xFF' has the value −1; if type char has the
same range of values as unsigned char, the character constant '/xFF' has the value +255.
14 EXAMPLE 3 Even if eight bits are used for objects that have type char, the construction '/x123'
specifies an integer character constant containing only one character, since a hexadecimal escape sequence
is terminated only by a non-hexadecimal character. To specify an integer character constant containing the
two characters whose values are '/x12' and '3', the construction '/0223' may be used, since an octal
escape sequence is terminated after three octal digits. (The value of this two-character integer character
constant is implementation-defined.)
15 EXAMPLE 4 Even if 12 or more bits are used for objects that have type wchar_t, the construction
L'/1234' specifies the implementation-defined value that results from the combination of the values
0123 and '4'.
Forward references: common definitions <stddef.h> (7.17), the mbtowc function
(7.20.7.2).

6.4.5 String literals
Syntax
1 string-literal:
" s-char-sequenceopt "
L" s-char-sequenceopt "
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote ", backslash /, or new-line character
escape-sequence
Description
2 Acharacter string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz". A wide string literal is the same, except prefixed by the
letter L.
3 The same considerations apply to each element of the sequence in a character string
literal or a wide string literal as if it were in an integer character constant or a wide
character constant, except that the single-quote ' is representable either by itself or by the
escape sequence /', but the double-quote " shall be represented by the escape sequence
/".
Semantics
4 In translation phase 6, the multibyte character sequences specified by any sequence of
adjacent character and wide string literal tokens are concatenated into a single multibyte
character sequence. If any of the tokens are wide string literal tokens, the resulting
multibyte character sequence is treated as a wide string literal; otherwise, it is treated as a
character string literal.
5 In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals.65) The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence; for wide string literals, the array elements have type wchar_t, and are
initialized with the sequence of wide characters corresponding to the multibyte character

sequence, as defined by the mbstowcs function with an implementation-defined current
locale. The value of a string literal containing a multibyte character or escape sequence
not represented in the execution character set is implementation-defined.
6 It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
7 EXAMPLE This pair of adjacent character string literals
"/x12" "3"
produces a single character string literal containing the two characters whose values are '/x12' and '3',
because escape sequences are converted into single members of the execution character set just prior to
adjacent string literal concatenation.
Forward references: common definitions <stddef.h> (7.17), the mbstowcs
function (7.20.8.1).
6.4.6 Punctuators
Syntax
1 punctuator: one of
[ ] ( ) { } . ->
++ -- & * + - ~ !
/ % << >> < > <= >= == != ^ | && ||
? : ; ...
= *= /= %= += -= <<= >>= &= ^= |=
, # ##
<: :> <% %> %: %:%:
Semantics
2 A punctuator is a symbol that has independent syntactic and semantic significance.
Depending on context, it may specify an operation to be performed (which in turn may
yield a value or a function designator, produce a side effect, or some combination thereof)
in which case it is known as an operator (other forms of operator also exist in some
contexts). An operand is an entity on which an operator acts.

3 In all aspects of the language, the six tokens66)
<: :> <% %> %: %:%:
behave, respectively, the same as the six tokens
[ ] { } # ##
except for their spelling.67)
Forward references: expressions (6.5), declarations (6.7), preprocessing directives
(6.10), statements (6.8).
6.4.7 Header names
Syntax
1 header-name:
< h-char-sequence >
" q-char-sequence "
h-char-sequence:
h-char
h-char-sequence h-char
h-char:
any member of the source character set except
the new-line character and >
q-char-sequence:
q-char
q-char-sequence q-char
q-char:
any member of the source character set except
the new-line character and "
Semantics
2 The sequences in both forms of header names are mapped in an implementation-defined
manner to headers or external source file names as specified in 6.10.2.
3 If the characters ', /, ", //, or /* occur in the sequence between the < and > delimiters,
the behavior is undefined. Similarly, if the characters ', /, //, or /* occur in the

sequence between the " delimiters, the behavior is undefined.68) A header name
preprocessing token is recognized only within a #include preprocessing directive.
4 EXAMPLE The following sequence of characters:
0x3<1/a.h>1e2
#include <1/a.h>
#define const.member@$
forms the following sequence of preprocessing tokens (with each individual preprocessing token delimited
by a { on the left and a } on the right).
{0x3}{<}{1}{/}{a}{.}{h}{>}{1e2}
{#}{include} {<1/a.h>}
{#}{define} {const}{.}{member}{@}{$}
Forward references: source file inclusion (6.10.2).
6.4.8 Preprocessing numbers
Syntax
1 pp-number:
digit
. digit
pp-number digit
pp-number identifier-nondigit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .
Description
2 A preprocessing number begins with a digit optionally preceded by a period (.) and may
be followed by valid identifier characters and the character sequences e+, e-, E+, E-,
p+, p-, P+, or P-.
3 Preprocessing number tokens lexically include all floating and integer constant tokens.
Semantics
4 A preprocessing number does not have type or a value; it acquires both after a successful
conversion (as part of translation phase 7) to a floating constant token or an integer
constant token.

6.4.9 Comments
1 Except within a character constant, a string literal, or a comment, the characters /*
introduce a comment. The contents of such a comment are examined only to identify
multibyte characters and to find the characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment, the characters //
introduce a comment that includes all multibyte characters up to, but not including, the
next new-line character. The contents of such a comment are examined only to identify
multibyte characters and to find the terminating new-line character.
3 EXAMPLE
"a//b" // four-character string literal
#include "//e" // undefined behavior
// */ // comment, not syntax error
f = g/**//h; // equivalent tof = g / h;
///
i(); // part of a two-line comment
//
/ j(); // part of a two-line comment
#define glue(x,y) x##y
glue(/,/) k(); // syntax error, not comment
/*//*/ l(); // equivalent to l();
m = n//**/o
+ p; // equivalent tom = n + p;

cassper

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
6.4 Lexical elements

Syntax1 token:keywordidentifierconstantstring-literalpunctuatorpreprocessing-token:header-nameidentifierpp-numbercharacter-constantstring-literalpunctuatoreach non-white-space character that cannot be
复制链接

扫一扫

专栏目录