谈谈C语言中的序列点（sequence point）和副作用（side effects）

一个表达式有一个值，而在写出这个表达式的时候可能只是想要取得这个表达式的值。但有些表达式会有副作用。而有些表达式没有副作用，有时候我们正是要利用表达式的副作用来工作。比如：
int a = 10;
int b = a;     /* a这个表达式在这里没有副作用，这里只是想要取得 */
/* a这个变量的值10，而b = a这个表达式有副作用，它的 */
/* 副作用是使b的值改变成a的值。 */
这就是所谓的一个表达式的副作用。正是因为有了副作用，很多功能才得以完成。有些表达式既会产生一个值，也会产生副作用。如i++这个表达式既会产生一个值（它是i自增以前的值），也会产生副作用。
在一个序列点之间，连续两次改变，并且访问该变量，会带来问题，比如经典的：
int i = 1;
a = i++;

int i = 1;
printf("%d, %d, %d\n", i++, i++, i++);
i = 1;
printf("%d\n", i++ + i++ + i++);
i = 1;
printf("%d\n", ++i + ++i + ++i);
很多大学的C语言老师都会讲解这个问题，包括我的老师，在讲的时候笔者就没有弄明白，
其实，这是一个不值得讲解的问题，这是在跟编译器较劲，不同的编译器可能会得出不同的结果（但是平常的编译器可能会得出相同的结果，让程序员私下总结错误的经验。），这种根据不同的实现而得出不同的结果的代码没什么用。i++ + i++ + i++只是一个表达式，在这个表达式的内多次访问了变量i，结果不确定。并且这又会引发另外一个有趣的问题，可能有人会认为在这条语句执行完成以后i自加了3次，那i肯定是4？这也不确定，可能很多编译器做得确实是4，但是，在C标准中有这样一条：当一个表达式的值取决于编译器实现而不是C语言标准的时候，其中所做的任何处理都会不确定。即，如果有一个编译器在i++ + i++ + i++这个表达式中只读取一次i的值，并且一直记住这个值，那么算第一个i++，因为i的值是1所以算出后i的值为2，再算第二个因为假设的是只读取一次i的值，那此时i的值还是1并且被加到2（因为没有经过序列点，所以i的值不能肯定为2），于是经过三次从1加到2的过程以后，最后i的值是2而不是期望的4，呵呵。其实这要看编译器如何实现了，不过既然得看编译器如何实现，那这种代码也得被炒鱿鱼。

1. chinaunix上找了一段非常通俗的描述，讲的很好。

C语言中，只包含一个表达式的语句，如
x = (i++) * 2;

（1）函数调用时，实参表内全部参数求值结束，函数的第一条指令执行之前（注意参数分隔符“,”不是顺序点）；
（2）&&操作符的左操作数结尾处；
（3）||操作符的左操作数结尾处；
（4）?:操作符的第一个操作数的结尾处；
（5）逗号运算符；
（6）表达式求值的结束点，具体包括下列几类：自动对象的初值计算结束处；表达式语句末尾的分号处； do/while/if/switch/for语句的控制条件的右括号处；for语句控制条件中的两个分号处；return语句返回值计算结束（末尾的分号）处。

y = x++, x+1;

y = (x++) * (x++), 执行前x=2, y=?

However, the problem with standards manuals is that they only make sense if you already know what they mean. If people write them in English, the more precise they try to be, the longer, duller and more obscure they become. If they write them using mathematical notation to define the language, the manuals become inaccessible to too many people.

2. 什么是副作用？举例子来说明。

int a = 5;
int b = a ++;


","会生成序列点。

","用于把多条语句拼接成一条语句。 例如：

int b = 5;
++ b;


int b = 5, ++b;


&&和||会产生序列点

||&&类似。

?:中的"?"会产生序列点

int a = 5;
int b = a++ > 5? 0 : a;


b的结果是什么？因为"?"处有序列点，其左边的表达式必须先求值完毕。 a++ > 5在和5比较时，a并没有自增，所以表达式求值为false。 因为"?"处的序列点，其左边表达式的副作用也要立即生效，即a自增1，变为6。 因为"?"左边的表达式求值为false，所以三元操作符?:返回:右边的值a。 此时a的值是6，所以b的值是6。

1). 一个重要的序列点在完整表达式的结尾，所谓完整表达式就是指不是一个更大的表达式的子表达式的表达式，仔细理解。
int i = 1;
i++;     /* i++是一个完整表达式 */
i++ + 1; /* i++就不是一个完整的表达式，因为它是i++ + 1这个完整表达式的一部分 */
具体的完整表达式的种类，可以查阅相关资料，C99的标准文档是一个不错的选择。
2). 逗号表达式。逗号表达式会严格的按照顺序来执行并且在被逗号分隔开的表达式之间有一个序列点，所以，前一个逗号表达式如果是i++，则后面的表达式可以肯定现在的值是原来的值加1（如果有溢出则另当别论）。如：
int i = 1;
i++, i++, i++;
printf("%d\n", i);

3). &&和||运算符。有一种短路算法来解决除法中的除0情况。如下
int a = 10;
int b = 0;
if (b && a/b)
{ /* some code here */ }

4). 条件运算符? : 。在问号的地方也存在一个序列点，也没什么可讲。反正就是问号前后可以访问和改变同一个变量，并且这种访问是安全的。
最后，在一个表达式内的求值顺序没有固定顺序，还有一个表现是，如下：
funa() + funb() + func();
C语言标准没有规定这三个函数谁会先执行，如果对顺序有要求，可以用临时变量来缓解。

序列点之间的执行顺序

	int i = 3;
int ans = (++i)+(++i)+(++i);


(++i)+(++i)+(++i)之间并没有序列点，它们的执行顺序如何呢？ gcc编译后，先执行两个++i，把它们相加后，再计算第三个++i， 再相加。而Microsoft VC++编译后，先执行三个++i，再相加。 两者得到的结果不同，谁对谁错呢？

3.  MISRA-C：2004这样告诫用户：

Rule 12.2 (required): The value of an expression shall be the same under any order of evaluation that the standard permits. [Unspecified 7–9; Undefined 18]
Apart from a few operators (notably the function call operator (), &&, ||, ?: and , (comma)) the order in which sub-expressions are evaluated is unspecified and can vary. This means that no reliance can be placed on the order of evaluation of sub-expressions, and in particular no reliance can be placed on the order in which side effects occur. Those points in the evaluation of an expression at which all previous side effects can be guaranteed to have taken place are called “sequence points”. Sequence points and side effects are described in sections 5.1.2.3, 6.3 and 6.6 of ISO 9899:1990 [2].
Note that the order of evaluation problem is not solved by the use of parentheses, as this is not a precedence issue.
The following notes give some guidance on how dependence on order of evaluation may occur, and therefore may assist in adopting the rule.
increment or decrement operators
As an example of what can go wrong, consider
x = b[i] + i++;
This will give different results depending on whether b[i] is evaluated before i++ or vice versa. The problem could be avoided by putting the increment operation in a separate statement. The example would then become:
x = b[i] + i;
i++;
function arguments
The order of evaluation of function arguments is unspecified.
x = func( i++, i );
This will give different results depending on which of the function’s two parameters is evaluated first. l function pointers
If a function is called via a function pointer there shall be no dependence on the order in which function designator and function arguments are evaluated.
function calls
Functions may have additional effects when they are called (e.g. modifying some global data). Dependence on order of evaluation could be avoided by invoking the function prior to the expression that uses it, making use of a temporary variable for the value.
For example
x = f(a) + g(a);
could be written as
x = f(a);
x += g(a);
As an example of what can go wrong, consider an expression to get two values off a stack, subtract the second from the first, and push the result back on the stack:
push( pop() - pop() );
This will give different results depending on which of the pop() function calls is evaluated first (because pop() has side effects).
l nested assignment statements
Assignments nested within expressions cause additional side effects. The best way to avoid any chance of this leading to a dependence on order of evaluation is to not embed assignments within expressions.
For example, the following is not recommended:
x = y = y = z / 3 ;
x = y = y++;
l accessing a volatile
The volatile type qualifier is provided in C to denote objects whose value can change independently of the execution of the program (for example an input register). If an object of volatile qualified type is accessed this may change its value. C compilers will not optimise out reads of a volatile. In addition, as far as a C program is concerned, a read of a volatile has a side effect (changing the value of the volatile). It will usually be necessary to access volatile data as part of an expression, which then means there may be dependence on order of evaluation. Where possible though it is recommended that volatiles only be accessed in simple assignment statements, such as the following:
volatile uint16_t v;
x = v;
The rule addresses the order of evaluation problem with side effects. Note that there may also be an issue with the number of times a sub-expression is evaluated, which is not covered by this rule. This can be a problem with function invocations where the function is implemented as a macro. For example, consider the following function-like macro and its invocation:
#define MAX(a, b) ( ((a) > (b)) ? (a) : (b) )
z = MAX( i++, j );
The definition evaluates the first parameter twice if a > b but only once if a ² b. The macro invocation may thus increment i either once or twice, depending on the values of i and j. It should be noted that magnitude-dependent effects, such as those due to floating-point rounding, are also not addressed by this rule. Although the order in which side-effects occur is undefined, the result of an operation is otherwise well-defined and is controlled by the structure of the expression. In the following example, f1 and f2 are floating-point variables; F3, F4 and F5 denote expressions with floating-point types.
f1 = F3 + (F4 + F5);
f2 = (F3 + F4) + F5;
The addition operations are, or at least appear to be, performed in the order determined by the position of the parentheses, i.e. first F4 is added to F5 then secondly F3 is added to give the value of f1. Provided that F3, F4 and F5 contain no side-effects, their values are independent of the order in which they are evaluated. However, the values assigned to f1 and f2 are not guaranteed to be the same because floating-point rounding following the addition operations will depend on the values being added.
3. gcc本身对于这种违反序列点的表达式努力的给出了warning，使用-Wsequence-point， -Wall会给出这个警告。
-Wsequence-point
Warn about code that may have undefined semantics because of violations of sequence point rules in the C standard. The C standard defines the order in which expressions in a C program are evaluated in terms of sequence points, which represent a partial ordering between the execution of parts of the program: those executed before the sequence point, and those executed after it. These occur after the evaluation of a full expression_r(one which is not part of a larger expression), after the evaluation of the first operand of a &&, ||, ? : or , (comma) operator, before a function is called (but after the evaluation of its arguments and the expression denoting the called function), and in certain other places. Other than as expressed by the sequence point rules, the order of evaluation of subexpressions of an expression is not specified. All these rules describe only a partial order rather than a total order, since, for example, if two functions are called within one expression with no sequence point between them, the order in which the functions are called is not specified. However, the standards committee have ruled that function calls do not overlap. It is not specified when between sequence points modifications to the values of objects take effect. Programs whose behavior depends on this have undefined behavior; the C standard specifies that “Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.”. If a program breaks these rules, the results on any particular implementation are entirely unpredictable.
Examples of code with undefined behavior are a = a++;, a[n] = b[n++] and a[i++] = i;. Some more complicated cases are not diagnosed by this option, and it may give an occasional false positive result, but in general it has been found fairly effective at detecting this sort of problem in programs. The present implementation of this option only works for C programs. A future implementation may also work for C++ programs. The C standard is worded confusingly, therefore there is some debate over the precise meaning of the sequence point rules in subtle cases. Links to discussions of the problem, including proposed formal definitions, may be found on the GCC readings page, at http://gcc.gnu.org/readings.html
4. gcc是这样来实现这个check的：
Walk the tree X, and record accesses to variables.  If X is written by the parent tree, WRITER is the parent. We store accesses in one of the two lists: PBEFORE_SP, and PNO_SP.  If this  expression or its only operand forces a sequence point, then everything up to the sequence point is stored in PBEFORE_SP.  Everything else gets stored in PNO_SP.
Once we return, we will have emitted warnings if any subexpression before such a sequence point could be undefined.  On a higher level, however, the sequence point may not be relevant, and we'll merge the two lists.
Example: (b++, a) + b;
The call that processes the COMPOUND_EXPR will store the increment of B in PBEFORE_SP, and the use of A in PNO_SP.  The higher-level call that processes the PLUS_EXPR will need to merge the two lists so that eventually, all accesses end up on the same list (and we'll warn about the unordered subexpressions b++ and b.
A note on merging.  If we modify the former example so that our expression becomes
(b++, b) + a
care must be taken not simply to add all three expressions into the final PNO_SP list.  The function merge_tlist takes care of that by merging the before-SP list of the COMPOUND_EXPR into its after-SP list in a special way, so that no more than one access to B is recorded.
5. 但是gcc对于这个warning做的有4个问题：
（1） 对于结构体元素不能给出warning （s->a++ = s->a + 5;）, 原因在于它没有把s->a看成一个整体的元素，而是分解开来做的，不能识别出s->a 是一次read，而s->a++是一次writer
（2）将a[i]分解来看，所以可以check“a[i] + i++”，但是对于“a[i]++ + a[i]”无能为力。
（3）对于return语句没有verify_sequence_points
（4）对于alias（例如 p = q; *p++ = q++;）无法处理，因为前段只是简单的语法树分析，还做不到这一点。