What is the strict aliasing rule?

原帖: http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

===============================================================================================================================

When asking about common undefined behavior in C, souls more enlightened than I referred to the strict aliasing rule.
What are they talking about?

===============================================================================================================================

A typical situation you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

struct Msg
{
   unsigned int a;
   unsigned int b;
};

int main()
{
   // Get a 32-bit buffer from the system
   uint32_t* buff = malloc(sizeof(Msg));

   // Alias that buffer through message
   Msg* msg = (Msg*)(buff);

   // Send a bunch of messages    
   for (int i =0; i < 10; ++i)
   {
      msg->a = i;
      msg->b = i+1;
      SendWord(buff[0] );
      SendWord(buff[1] );   

   }

}

The strict aliasing rule makes this setup illegal, dereferencing a pointer that aliases another of an incompatible type is undefined behavior. Unfortunately you can still code this way, maybe* get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

*(gcc appears pretty inconsistent in its ability to give aliasing warnings, giving us a friendly warning here but nothere)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change at anytime from anywhere by anybody. So to get an extra performance edge, and assuming most people don't type pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

 void SendMessage(uint32_t* buff, size_t size32)
 {
    for (int i = 0; i < size32; ++i) 
    {
       SendWord(buff[i]);
    }
 }

And rewrote our earlier loop to take advantage of this convenient function

   for (int i =0; i < 10; ++i)
   {
      msg->a = i;
      msg->b = i+1;
      SendMessage(buff, 2);
   }

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe its just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of whats happening under the hood, its still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

  • Use a union. Most compilers support this without complaining about strict aliasing. Its usage, however, (write to the struct then read from the buffer) is technically in violation of the spec.
    union {
        Msg msg;
        unsigned int asBuffer[sizeof(Msg)];
    };
  • You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))

  • You can use char* for aliasing instead of your system's word. The rules allow an exception for char*. Its always assumed that char* aliases other types. However this won't work the other way, there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn aboutendianessword alignment, and how to deal with alignment issues through packing structs correctly.

===============================================================================================================================

The best explanation I have found is by Mike Acton, Understanding Strict Aliasing. It's focused a little on PS3 development, but that's basically just GCC.

From the article: "Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"

So basically if you have an int* and a float* they are not allowed to point to the same memory location. If your code does not respect this, then the compiler's optimizer will most likely break your code.

The exception to the rule is a char*, which is allowed to point to any type.

===============================================================================================================================

This is the strict aliasing rule, found in section 3.10 of the C++03 standard (other answers provide good explanation, but none provided the rule itself):

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • char or unsigned char type.

New C++11 wording:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • char or unsigned char type.

Two changes were small: glvalue instead of lvalue, and clarification of the aggregate/union case.

The third change makes a stronger guarantee (relaxes the strong aliasing rule): The new concept of similar types that are now safe to alias.


Also the C wording (C99; ISO/IEC 9899:1999 6.5/7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

===============================================================================================================================

Strict aliasing doesn't refer only to pointers, it affects references as well, I wrote a paper about it for the boost developer wiki and it was so well received that I turned it into a page on my consulting web site. It explains completely what it is, why it confuses people so much and what to do about it.  Strict Aliasing White Paper . In particular it explains why unions are risky behavior for C++, and why using memcpy is the only fix portable across both C and C++. Hope this is helpful.

===============================================================================================================================

Type punning  is a major example of breaking strict aliasing.

===============================================================================================================================

Strict aliasing is not allowing different pointer types to the same data.

This article should help you understand the issue in full detail.

===============================================================================================================================



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值