(Part of C++ FAQ Lite, Copyright © 1991-2006, Marshall Cline, cline@parashift.com)
FAQs in section [26]:
- [26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
- [26.2] What are the units of sizeof?
- [26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
- [26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
- [26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
- [26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
- [26.7] What is a "POD type"?
- [26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
- [26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
- [26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
- [26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a ?
- [26.12] How can I tell if an integer is a power of two without looping?
[26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
No, sizeof(char) is always 1. Always. It is never 2. Never, never, never.
Even if you think of a "character" as a multi-byte thingy, char is not. sizeof(char) is always exactly 1. No exceptions, ever.
Look, I know this is going to hurt your head, so please, please just read the next few FAQs in sequence and hopefully the pain will go away by sometime next week.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.2] What are the units of sizeof?
Bytes.
For example, if sizeof(Fred) is 8, the distance between two Fred objects in an array of Freds will be exactly 8 bytes.
As another example, this means sizeof(char) is one byte. That's right: one byte. One, one, one, exactly one byte, always one byte. Never two bytes. No exceptions.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
Yes that's right: the thing commonly referred to as a "character" might be different from the thing C++ calls a char.
I'm really sorry if that hurts, but believe me, it's better to get all the pain over with at once. Take a deep breath and repeat after me: "character and char might be different." There, doesn't that feel better? No? Well keep reading — it gets worse.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
Yep, that's right: a C++ byte might have more than 8 bits.
The C++ language guarantees a byte must always have at least 8 bits. But there are implementations of C++ that have more than 8 bits per byte.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
Wrong.
I have heard of one implementation of C++ that has 64-bit "bytes." You read that right: a byte on that implementation has 64 bits. 64 bits per byte. 64. As in 8 times 8.
And yes, you're right, combining with the above would mean that a char on that implementation would have 64 bits.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
Here are the rules:
- The C++ language gives the programmer the impression that memory is laid out as a sequence of something C++ calls "bytes."
- Each of these things that the C++ language calls a byte has at least 8 bits, but might have more than 8 bits.
- The C++ language guarantees that a char* (char pointers) can address individual bytes.
- The C++ language guarantees there are no bits between two bytes. This means every bit in memory is part of a byte. If you grind your way through memory via a char* , you will be able to see every bit.
- The C++ language guarantees there are no bits that are part of two distinct bytes. This means a change to one byte will never cause a change to a different byte.
- The C++ language gives you a way to find out how many bits are in a byte in your particular implementation: include the header <climits> , then the actual number of bits per byte will be given by the CHAR_BIT macro.
Let's work an example to illustrate these rules. The PDP-10 has 36-bit words with no hardware facility to address anything within one of those words. That means a pointer can point only at things on a 36-bit boundary: it is not possible for a pointer to point 8 bits to the right of where some other pointer points.
One way to abide by all the above rules is for a PDP-10 C++ compiler to define a "byte" as 36 bits. Another valid approach would be to define a "byte" as 9 bits, and simulate a char* by two words of memory: the first could point to the 36-bit word, the second could be a bit-offset within that word. In that case, the C++ compiler would need to add extra instructions when compiling code using char* pointers. For example, the code generated for *p = 'x' might read the word into a register, then use bit-masks and bit-shifts to change the appropriate 9-bit byte within that word. An int* could still be implemented as a single hardware pointer, since C++ allows sizeof(char*) != sizeof(int*) .
Using the same logic, it would also be possible to define a PDP-10 C++ "byte" as 12-bits or 18-bits. However the above technique wouldn't allow us to define a PDP-10 C++ "byte" as 8-bits, since 8*4 is 32, meaning every 4th byte we would skip 4 bits. A more complicated approach could be used for those 4 bits, e.g., by packing nine bytes (of 8-bits each) into two adjacent 36-bit words. The important point here is that memcpy() has to be able to see every bit of memory: there can't be any bits between two adjacent bytes.
Note: one of the popular non-C/C++ approaches on the PDP-10 was to pack 5 bytes (of 7-bits each) into each 36-bit word. However this won't work in C or C++ since 5*7 = 35, meaning using char* s to walk through memory would "skip" a bit every fifth byte (and also because C++ requires bytes to have at least 8 bits).
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.7] What is a "POD type"?
A type that consists of nothing but Plain Old Data.
A POD type is a C++ type that has an equivalent in C, and that uses the same rules as C uses for initialization, copying, layout, and addressing.
As an example, the C declaration struct Fred x; does not initialize the members of the Fred variable x. To make this same behavior happen in C++, Fred would need to not have any constructors. Similarly to make the C++ version of copying the same as the C version, the C++ Fred must not have overloaded the assignment operator. To make sure the other rules match, the C++ version must not have virtual functions, base classes, non-static members that are private or protected, or a destructor. It can, however, have static data members, static member functions, and non-static non-virtual member functions.
The actual definition of a POD type is recursive and gets a little gnarly. Here's a slightly simplified definition of POD: a POD type's non-static data members must be public and can be of any of these types: bool, any numeric type including the various char variants, any enumeration type, any data-pointer type (that is, any type convertible to void* ), any pointer-to-function type, or any POD type, including arrays of any of these. Note: data-pointers and pointers-to-function are okay, but pointers-to-member are not. Also note that references are not allowed. In addition, a POD type can't have constructors, virtual functions, base classes, or an overloaded assignment operator.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
For symmetry, it is usually best to initialize all non-static data members in the constructor's "initialization list," even those that are of a built-in / intrinsic / primitive type. The FAQ shows you why and how .
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
Yes, if you initialize your built-in / intrinsic / primitive variable by an expression that the compiler doesn't evaluate solely at compile-time. The FAQ provides several solutions for this (subtle!) problem.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
No, the C++ language requires that your operator overloads take at least one operand of a "class type" or enumeration type. The C++ language will not let you define an operator all of whose operands / parameters are of primitive types.
For example, you can't define an operator== that takes two char* s and uses string comparison . That's good news because if s1 and s2 are of type char* , the expression s1 == s2 already has a well defined meaning: it compares the two pointers, not the two strings pointed to by those pointers. You shouldn't use pointers anyway. Use std::string instead of char* .
If C++ let you redefine the meaning of operators on built-in types, you wouldn't ever know what 1 + 1 is: it would depend on which headers got included and whether one of those headers redefined addition to mean, for example, subtraction.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a ?
Because you can't.
Look, please don't write me an email asking me why C++ is what it is. It just is. If you really want a rationale, buy Bjarne Stroustrup's excellent book, "Design and Evolution of C++" (Addison-Wesley publishers). But if your real goal is to write some code, don't waste too much time figuring out why C++ has these rules, and instead just abide by its rules.
So here's the rule: if a points to an array of thingies that was allocated via new T[n] , then you must, must, must delete it via delete[] a . Even if the elements in the array are built-in types. Even if they're of type char or int or void* . Even if you don't understand why.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.12] How can I tell if an integer is a power of two without looping?
{
return i > 0 && (i & (i - 1)) == 0;
}
[ Top | Bottom | Previous section | Next section | Search the FAQ ]