任何数据在内存中都是以二进制(0或1)顺序存储的,每一个1或0被称为1位,而在x86CPU上一个字节是8位。比如一个16位(2 字节)的short int型变量的值是1000,那么它的二进制表达就是:00000011 11101000。由于Intel CPU的架构原因,它是按字节倒 序存储的,那么就因该是这样:11101000 00000011,这就是定点数1000在内存中的结构。


这种结构是一种科学计数法,用符号、指数和 尾数来表示,底数定为2——即把一个浮点数表示为尾数乘以2的指数次方再添上符号。


类型      符号位 阶码  尾数  长度 
float       1      8     23    32 
double    1     11    52    64 
临时数     1     15    64    80

由于通常C编译器默认浮点数是double型的,下面以double为例: 共计64位,折合8字节。

由最高到最低位分别是第63、62、61、……、0位: 最高位63位是符号位,1表示该数为负,0正; 62-52位,一共11位是指数位; 51-0位,一共52位是尾数位。


把整数部和小数部分开处理:整数部直接化十六进制:960E。小数的处理: 0.4=0.5*0+0.25*1+0.125*1+0.0625*0+…… 实际上这永远算不完!这就是著名的浮点数精度问题。所以直到加上前面的整数部分算够53位就行了(隐藏位技术:最高位的1 不写入内存)。


科学记数法为:1.001……乘以2的15次方。指数为15! 于是来看阶码,一共11位,可以表示范围是-1024 ~ 1023。因为指数可以为负,为了便于计算,规定都先加上1023,在这里, 15+1023=1038。

二进制表示为:100 00001110 符号位:正—— 0 ! 合在一起(尾数二进制最高位的1不要): 01000000 11100010 11000001 11001101 01010101 01010101 01010101 01010101 按字节倒序存储的十六进制数就是: 55 55 55 55 CD C1 E2 40。


需要的头文件  <memory.h> or <string.h>

函数原型 void *memset(void *s, int ch, unsigned n);



unsigned char (*p)[40]=new unsigned char [100][40];



float (*p)[40]=new float[100][40];



Is it legal to use memset(,0,) on array of doubles?

Is it legal to zero array of doubles (using memset(,0,)) or struct containing doubles ?

The question implies two different things:

(1) From the point of view of C standard, is this UB of not ? (on a fixed platform, how can this UB ... it just depends of floating representation that's all ...)

(2) From practical point of view: is it ok on intel platform ? (no matter what standard is saying).

7 Answers

The C99 standard Annex F says:

This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for microprocessor systems, second edition (IEC 60559:1989), previously designated IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE 754−1985)IEEE Standard for Radix-Independent Floating-Point Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove dependencies on radix and word length. IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that defines __STDC_IEC_559__shall conform to the specifications in this annex. Where a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified behavior is adopted by reference, unless stated otherwise.

And, immediately after,

The C floating types match the IEC 60559 formats as follows:

  • The float type matches the IEC 60559 single format.
  • The double type matches the IEC 60559 double format.

So, if IEC 60559 is basically IEEE 754-1985 and this specifies that 8 zero bytes mean 0.0 (as @David Heffernan said), this means that if you find __STDC_IEC_559__ defined you can safely do a 0.0 initialization with memset.

So its IEEE754 unless the compiler documentation explicitly states otherwise? –   Loki Astari  Jan 7 '11 at 23:27
Actually, as I understood it, it is the contrary: it may be anything, but if you find defined __STDC_IEC_559__you can be sure that it's IEC 60559 aka IEEE 754. –   Matteo Italia  Jan 7 '11 at 23:29 

If you are talking about IEEE754 then the standard defines +0.0 to double precision as 8 zero bytes. If you know that you are backed by IEEE754 floating point then this is well-defined.

As for Intel, I can't think of a compiler that doesn't use IEEE754 on Intel x86/x64.

What about all the mobile devices that people are developing for. Do they all use IEEE754? –   Loki Astari  Jan 7 '11 at 23:27
@Martin i've no idea. –   David Heffernan  Jan 7 '11 at 23:31
Still, from a practical standpoint I don't think that anyone designing a floating-point format will ever make it such that setting a double to all zero will make it become problematic. :) –   Matteo Italia  Jan 7 '11 at 23:47 

David Heffernan has given a good answer for part (2) of your question. For part (1):

The C99 standard makes no guarantees about the representation of floating-point values in the general case. § says:

The representations of all types are unspecified except as stated in this subclause.

...and that subclause makes no further mention of floating point.

You said:

(on a fixed platform, how can this UB ... it just depends of floating representation that's all ...)

Indeed - there a difference between "undefined behaviour", "unspecified behaviour" and "implementation-defined behaviour":

  • "undefined behaviour" means that anything could happen (including a runtime crash);
  • "unspecificed behaviour" means that the compiler is free to implement something sensible in any way it likes, but there is no requirement for the implementation choice to be documented;
  • "implementation-defined behaviour" means that the compiler is free to implement something sensible in any way it likes, and is supposed to document that choice (for example, see here for the implementation choices documented by the most recent release of GCC);

and so, as floating point representation is unspecified behaviour, it can vary in an undocumented manner from platform to platform (where "platform" here means "the combination of hardware and compiler" rather than just "hardware").

(I'm not sure how useful the guarantee that a double is represented such that all-bits-zero is +0.0 if__STDC_IEC_559__ is defined, as described in Matteo Italia's answer, actually is in practice. For example, GCC never defines this, even though is uses IEEE 754 / IEC 60559 on many hardware platforms.)

I would just assume IEEE 754 since otherwise floating point behavior might as well be rand(). –   R..  Jan 7 '11 at 23:02
Do all hand held devices use IEEE754 as floating point rep. Desktop may be relatively standard but the mobile device market is a bit more splintered. –   Loki Astari  Jan 7 '11 at 23:32

As Matteo Italia says, that's legal according to the standard, but I wouldn't use it. Something like

double *p = V, *last = V + N; // N - count
while(p != last) *(p++) = 0;

is at least twice faster.

I seriously doubt it's faster, and would expect it to be much slower. memset is typically the most-optimized function in any C library implementation, for good reason. –   R..  Jan 7 '11 at 23:03
It may not be faster, but it is safer and the speed difference will be negligible in most situations. –  Loki Astari  Jan 7 '11 at 23:30
I tested the code before posting it (for N = 10 ^ 7, on a Linux machine) and seemed to be faster than memset(V, 0, N * sizeof(double)). –   Andrei Sfrent  Jan 7 '11 at 23:44
A memset is often inlined directly by the compiler and also implemented in the VPU, which mean it can clear 128 bits in each iteration. If your loop is faster you probably use a crappy compiler or have these optimizations switched off. –   onemasse  Jan 8 '11 at 8:13 
Tested on several machines running Ubuntu / Debian and it seems memset is faster if the area contains a lot of zeros already. I compiled my code with "gcc -o memsettest memsettest.c -O2". –   Andrei Sfrent  Jan 8 '11 at 12:30

Well, I think the zeroing is "legal" (after all, it's zeroing a regular buffer), but I have no idea if the standard lets you assume anything about the resulting logical value. My guess would be that the C standard leaves it as undefined.

Even though it is unlikely that you encounter a machine where this has problems, you may also avoid this relatively easily if you are really talking of arrays as you indicate in the question title, and if these arrays are of known length at compile time (that is not VLA), then just initializing them is probably even more convenient:

double A[133] = { 0 };

should always work. If you'd have to zero such an array again, later, and your compiler is compliant to modern C (C99) you can do this with a compound literal

memcpy(A, (double const[133]){ 0 }, 133*sizeof(double));

on any modern compiler this should be as efficient as memset, but has the advantage of not relying on a particular encoding of double.

Are you sure it compiles to the same? I would expect the latter to allocate 133*8 bytes on the stack,memset them to 0 (either inline or with a call), then call memcpy in a naive implementation. –   R..  Jan 7 '11 at 23:04
@R..: The same, probably not. And with the 133 I was perhaps exaggerating a bit. But for more reasonable values gcc and clang are able to transform this into just storing zeros. opencc and icc generate an extra copy, and there opencc seems to be the worst in generating a call to memcpy, indeed. I said should not will:) –   Jens Gustedt  Jan 8 '11 at 9:10

It's "legal" to use memset. The issue is whether it produces a bit pattern where array[x] == 0.0 is true. While the basic C standard doesn't require that to be true, I'd be interested in hearing examples where it isn't!

It appears memset is equivalent to 0.0 on IBM-AIX, HP-UX (PARISC), HP-UX (IA-64), Linux (IA-64, I think).

    double dFloat1 = 0.0;
    double dFloat2 = 111111.1111111;

    memset(&dFloat2, 0, sizeof(dFloat2));

    if(dFloat1 == dFloat2)
        fprintf(stdout, "memset appears to be equivalent to = 0.0\n");
        fprintf(stdout, "memset is NOT equivalent to = 0.0\n");

`memset`是C语言中的一个函数,主要用途是将一块内存区域中的每个字节都设置为特定的值。然而,`memset`并不直接处理数据类型,它处理的是字节层面的内容,因此可以用于初始化任何类型的数据结构,包括整型(int)、字符型(char)、浮点型(float)等。 当你使用`memset`来初始化一个`int`类型的变量或内存区域时,你实际上是将一个内存块中的所有字节都设置为了相同的值。例如,如果你将一个`int`类型的变量的内存设置为0,那么这个`int`变量的值就会是0。这是因为整型在内存中是由四个字节组成的(这取决于系统是32位还是64位,但通常情况下是四个字节),而如果你将这四个字节都设置为0,那么整型变量的值自然是0。 下面是一个使用`memset`初始化`int`变量的例子: ```c #include <string.h> #include <stdio.h> int main() { int a; memset(&a, 0, sizeof(a)); // 将变量a的内存内容全部设置为0 printf("%d\n", a); // 输出将会是0,因为a的四个字节都被设置为了0 return 0; } ``` 需要注意的是,使用`memset`来设置特定的非零值时,必须确保这个值在每个字节中都是一致的,并且这个值在转换为二进制后每个字节都是相同的。例如,可以使用`memset(&a, 0xFF, sizeof(a));`来将变量`a`的每个字节都设置为`0xFF`,但是在不同的平台上,字节顺序(大端或小端)可能导致不同的解释,因此在使用`memset`来设置非零值时需要谨慎。




