CStrings are a useful data type. They greatly simplify a lot of operations in MFC, making it much more convenient to do string manipulation. However, there are some special techniques to using CStrings, particularly hard for people coming from a pure-C background to learn. This essay discusses some of these techniques.

Much of what you need to do is pretty straightforward. This is not a complete tutorial on CStrings, but captures the most common basic questions.

String Concatenation

One of the very convenient features of CString is the ability to concatenate two strings. For example if we have

Collapse Copy Code
CString gray("Gray");
CString cat("Cat");
CString graycat = gray + cat;

is a lot nicer than having to do something like:

Collapse Copy Code
char gray[] = "Gray";
char cat[] = "Cat";
char * graycat = malloc(strlen(gray) + strlen(cat) + 1);
strcpy(graycat, gray);
strcat(graycat, cat);

Formatting (including integer-to-CString)

Rather than using sprintf or wsprintf, you can do formatting for a CString by using the Format method:

Collapse Copy Code
CString s;
s.Format(_T("The total is %d"), total);

The advantage here is that you don't have to worry about whether or not the buffer is large enough to hold the formatted data; this is handled for you by the formatting routines.

Use of formatting is the most common way of converting from non-string data types to a CString, for example, converting an integer to a CString:

Collapse Copy Code
CString s;
s.Format(_T("%d"), total);

I always use the _T( ) macro because I design my programs to be at least Unicode-aware, but that's a topic for some other essay. The purpose of _T( ) is to compile a string for an 8-bit-character application as:

Collapse Copy Code
#define _T(x) x // non-Unicode version

whereas for a Unicode application it is defined as

Collapse Copy Code
#define _T(x) L##x // Unicode version

so in Unicode the effect is as if I had written

Collapse Copy Code
s.Format(L"%d", total);

If you ever think you might ever possibly use Unicode, start coding in a Unicode-aware fashion. For example, never, ever use sizeof( ) to get the size of a character buffer, because it will be off by a factor of 2 in a Unicode application. We cover Unicode in some detail in Win32 Programming. When I need a size, I have a macro called DIM, which is defined in a file dim.h that I include everywhere:

Collapse Copy Code
#define DIM(x) ( sizeof((x)) / sizeof((x)[0]) )

This is not only useful for dealing with Unicode buffers whose size is fixed at compile time, but any compile-time defined table.

Collapse Copy Code
class Whatever { ... };
Whatever data[] = {
   { ... },
   { ... },

for(int i = 0; i < DIM(data); i++) // scan the table looking for a match 

Beware of those API calls that want genuine byte counts; using a character count will not work.

Collapse Copy Code
TCHAR data[20];
lstrcpyn(data, longstring, sizeof(data) - 1); // WRONG!

lstrcpyn(data, longstring, DIM(data) - 1); // RIGHT

WriteFile(f, data, DIM(data), &bytesWritten, NULL); // WRONG!

WriteFile(f, data, sizeof(data), &bytesWritten, NULL); // RIGHT

This is because lstrcpyn wants a character count, but WriteFile wants a byte count.

Using _T does not create a Unicode application. It creates a Unicode-aware application. When you compile in the default 8-bit mode, you get a "normal" 8-bit program; when you compile in Unicode mode, you get a Unicode (16-bit-character) application. Note that a CString in a Unicode application is a string that holds 16-bit characters.

Converting a CString to an integer

The simplest way to convert a CString to an integer value is to use one of the standard string-to-integer conversion routines.

While generally you will suspect that _atoi is a good choice, it is rarely the right choice. If you play to be Unicode-ready, you should call the function _ttoi, which compiles into _atoi in ANSI code and _wtoi in Unicode code. You can also consider using _tcstoul (for unsigned conversion to any radix, such as 2, 8, 10 or 16) or _tcstol (for signed conversion to any radix). For example, here are some examples:

Collapse Copy Code
CString hex = _T("FAB");
CString decimal = _T("4011");
ASSERT(_tcstoul(hex, 0, 16) == _ttoi(decimal));

Converting between char * and CString

This is the most common set of questions beginners have on the CString data type. Due largely to serious C++ magic, you can largely ignore many of the problems. Things just "work right". The problems come about when you don't understand the basic mechanisms and then don't understand why something that seems obvious doesn't work.

For example, having noticed the above example you might wonder why you can't write

Collapse Copy Code
CString graycat = "Gray" + "Cat";


Collapse Copy Code
CString graycat("Gray" + "Cat");

In fact the compiler will complain bitterly about these attempts. Why? Because the + operator is defined as an overloaded operator on various combinations of the CString and LPCTSTR data types, but not between two LPCTSTR data types, which are underlying data types. You can't overload C++ operators on base types like int and char, or char *. What will work is

Collapse Copy Code
CString graycat = CString("Gray") + CString("Cat");

or even

Collapse Copy Code
CString graycat = CString("Gray") + "Cat";

If you study these, you will see that the + always applies to at least one CString and one LPCSTR.

char * to CString

So you have a char *, or a string. How do you create a CString. Here are some examples:

Collapse Copy Code
char * p = "This is a test"

or, in Unicode-aware applications

Collapse Copy Code
TCHAR * p = _T("This is a test")


Collapse Copy Code
LPTSTR p = _T("This is a test");

you can write any of the following:

Collapse Copy Code
CString s = "This is a test";     // 8-bit only

CString s = _T("This is a test"); // Unicode-aware

CString s("This is a test");      // 8-bit only

CSTring s(_T("This is a test");   // Unicode-aware

CString s = p;
CString s(p);

Any of these readily convert the constant string or the pointer to a CString value. Note that the characters assigned are always copied into the CString so that you can do something like

Collapse Copy Code
TCHAR * p = _T("Gray");
CString s(p);
p = _T("Cat");
s += p;

and be sure that the resulting string is "GrayCat".

There are several other methods for CString constructors, but we will not consider most of these here; you can read about them on your own.

CString to char * I: Casting to LPCTSTR

This is a slightly harder transition to find out about, and there is lots of confusion about the "right" way to do it. There are quite a few right ways, and probably an equal number of wrong ways.

The first thing you have to understand about a CString is that it is a special C++ object which contains three values: a pointer to a buffer, a count of the valid characters in the buffer, and a buffer length. The count of the number of characters can be any size from 0 up to the maximum length of the buffer minus one (for the NUL byte). The character count and buffer length are cleverly hidden.

Unless you do some special things, you know nothing about the size of the buffer that is associated with the CString. Therefore, if you can get the address of the buffer, you cannot change its contents. You cannot shorten the contents, and you absolutely must not lengthen the contents. This leads to some at-first-glance odd workarounds.

The operator LPCTSTR (or more specifically, the operator const TCHAR *), is overloaded for CString. The definition of the operator is to return the address of the buffer. Thus, if you need a string pointer to the CString you can do something like

Collapse Copy Code
CString s("GrayCat");
LPCTSTR p =  s;

and it works correctly. This is because of the rules about how casting is done in C; when a cast is required, C++ rules allow the cast to be selected. For example, you could define (float) as a cast on a complex number (a pair of floats) and define it to return only the first float (called the "real part") of the complex number so you could say

Collapse Copy Code
Complex c(1.2f, 4.8f);
float realpart = c;

and expect to see, if the (float) operator is defined properly, that the value of realpart is now 1.2.

This works for you in all kinds of places. For example, any function that takes an LPCTSTR parameter will force this coercion, so that you can have a function (perhaps in a DLL you bought):

Collapse Copy Code
BOOL DoSomethingCool(LPCTSTR s);

and call it as follows

Collapse Copy Code
CString file("c://myfiles//coolstuff")
BOOL result = DoSomethingCool(file);

This works correctly because the DoSomethingCool function has specified that it wants an LPCTSTR and therefore the LPCTSTR operator is applied to the argument, which in MFC means that the address of the string is returned.

But what if you want to format it?

Collapse Copy Code
CString graycat("GrayCat");
CString s;
s.Format("Mew! I love %s", graycat);

Note that because the value appears in the variable-argument list (the list designated by "..." in the specification of the function)that there is no implicit coercion operator. What are you going to get?

Well, surprise, you actually get the string

Collapse Copy Code
"Mew! I love GrayCat"

because the MFC implementers carefully designed the CString data type so that an expression of type CString evaluates to the pointer to the string, so in the absence of any casting, such as in a Format or sprintf, you will still get the correct behavior. The additional data that describes a CString actually lives in the addresses below the nominal CString address.

What you can'tdo is modify the string. For example, you might try to do something like replace the "." by a "," (don't do it this way, you should use the National Language Support features for decimal conversions if you care about internationalization, but this makes a simple example):

Collapse Copy Code
CString v("1.00");  // currency amount, 2 decimal places

LPCTSTR p = v;
p[lstrlen(p) - 3] = ',';

If you try to do this, the compiler will complain that you are assigning to a constant string. This is the correct message. It would also complain if you tried

Collapse Copy Code
strcat(p, "each");

because strcat wants an LPTSTR as its first argument and you gave it an LPCTSTR.

Don't try to defeat these error messages. You will get yourself into trouble!

The reason is that the buffer has a count, which is inaccessible to you (it's in that hidden area that sits below the CString address), and if you change the string, you won't see the change reflected in the character count for the buffer. Furthermore, if the string happens to be just about as long as the buffer physical limit (more on this later), an attempt to extend the string will overwrite whatever is beyond the buffer, which is memory you have no right to write (right?) and you'll damage memory you don't own. Sure recipe for a dead application.

CString to char * II: Using GetBuffer

A special method is available for a CString if you need to modify it. This is the operation GetBuffer. What this does is return to you a pointer to the buffer which is considered writeable. If you are only going to change characters or shorten the string, you are now free to do so:

Collapse Copy Code
CString s(_T("File.ext"));
LPTSTR p = s.GetBuffer();
LPTSTR dot = strchr(p, '.'); // OK, should have used s.Find...

if(p != NULL)
    *p = _T('/0');

This is the first and simplest use of GetBuffer. You don't supply an argument, so the default of 0 is used, which means "give me a pointer to the string; I promise to not extend the string". When you call ReleaseBuffer, the actual length of the string is recomputed and stored in the CString. Within the scope of a GetBuffer/ReleaseBuffer sequene, and I emphasize this: You Must Not, Ever, Use Any Method Of CString on the CString whose buffer you have!The reason for this is that the integrity of the CString object is not guaranteed until the ReleaseBuffer is called. Study the code below:

Collapse Copy Code
CString s(...);
LPTSTR p = s.GetBuffer();
//... lots of things happen via the pointer p

int n = s.GetLength(); // BAD!!!!! PROBABLY WILL GIVE WRONG ANSWER!!!

s.TrimRight();         // BAD!!!!! NO GUARANTEE IT WILL WORK!!!!

s.ReleaseBuffer();     // Things are now OK

int m = s.GetLength(); // This is guaranteed to be correct

s.TrimRight();         // Will work correctly

Suppose you want to actually extend the string. In this case you must know how large the string will get. This is just like declaring

Collapse Copy Code
char buffer[1024];

knowing that 1024 is more than enough space for anything you are going to do. The equivalent in the CString world is

Collapse Copy Code
LPTSTR p = s.GetBuffer(1024);

This call gives you not only a pointer to the buffer, but guarantees that the buffer will be (at least) 1024 bytes in length.

Also, note that if you have a pointer to a const string, the string value itself is stored in read-only memory; an attempt to store into it, even if you've done GetBuffer, you have a pointer to read-only memory, so an attempt to store into the string will fail with an access error. I haven't verified this for CString, but I've seen ordinary C programmers make this error frequently.

A common "bad idiom" left over from C programmers is to allocate a buffer of fixed size, do a sprintf into it, and assign it to a CString:

Collapse Copy Code
char buffer[256];
sprintf(buffer, "%......", args, ...); // ... means "lots of stuff here"

CString s = buffer;

while the better form is to do

Collapse Copy Code
CString s;
s.Format(_T("%....", args, ...);

Note that this always works; if your string happens to end up longer than 256 bytes you don't clobber the stack!

Another common error is to be clever and realize that a fixed size won't work, so the programmer allocates bytes dynamically. This is even sillier:

Collapse Copy Code
int len = lstrlen(parm1) + 13 + lstrlen(parm2) + 10 + 100;
char * buffer = new char[len];
sprintf(buffer, "%s is equal to %s, valid data", parm1, parm2);
CString s = buffer;
delete [] buffer;

Where it can be easily written as

Collapse Copy Code
CString s;
s.Format(_T("%s is equal to %s, valid data"), parm1, parm2);

Note that the sprintf examples are not Unicode-ready (although you could use tsprintf and put _T() around the formatting string, but the basic idea is still that you are doing far more work than is necessary, and it is error-prone.

CString to char * III: Interfacing to a control

A very common operation is to pass a CString value in to a control, for example, a CTreeCtrl. While MFC provides a number of convenient overloads for the operation, but in the most general situation you use the "raw" form of the update, and therefore you need to store a pointer to a string in the TVITEM which is included within the TVINSERTITEMSTRUCT:

Collapse Copy Code
CString s;
// ... assign something to s

tvi.item.pszText = s; // Compiler yells at you here

// ... other stuff

HTREEITEM ti = c_MyTree.InsertItem(&tvi);

Now why did the compiler complain? It looks like a perfectly good assignment! But in fact if you look at the structure, you will see that the member is declared in the TVITEM structure as shown below:

Collapse Copy Code
LPTSTR pszText;
int cchTextMax;

Therefore, the assignment is not assigning to an LPCTSTR and the compiler has no idea how to cast the right hand side of the assignment to an LPTSTR.

OK, you say, I can deal with that, and you write

Collapse Copy Code
tvi.item.pszText = (LPCTSTR)s; // compiler still complains!

What the compiler is now complaining about is that you are attempting to assign an LPCTSTR to an LPTSTR, an operation which is forbidden by the rules of C and C++. You may not use this technique to accidentally alias a constant pointer to a non-constant alias so you can violate the assumptions of constancy. If you could, you could potentially confuse the optimizer, which trusts what you tell it when deciding how to optimize your program. For example, if you do

Collapse Copy Code
const int i = ...;
//... do lots of stuff

     ... = a[i];  // usage 1

// ... lots more stuff

     ... = a[i];  // usage 2

Then the compiler can trust that, because you said const, that the value of i at "usage1" and "usage2" is the same value, and it can even precompute the address of a[i] at usage1 and keep the value around for later use at usage2, rather than computing it each time. If you were able to write

Collapse Copy Code
const int i = ...;
int * p = &i;
//... do lots of stuff

     ... = a[i];  // usage 1

// ... lots more stuff

     (*p)++;      // mess over compiler's assumption

// ... and other stuff

     ... = a[i];  // usage 2

The the compiler would believe in the constancy of i, and consequently the constancy of the location of a[i], and the place where the indirection is done destroys that assumption. Thus, the program would exhibit one behavior when compiled in debug mode (no optimizations) and another behavior when compiled in release mode (full optimization). This Is Not Good. Therefore, the attempt to assign the pointer to i to a modifiable reference is diagnosed by the compiler as being bogus. This is why the (LPCTSTR) cast won't really help.

Why not just declare the member as an LPCTSTR? Because the structure is used both for reading and writing to the control. When you are writing to the control, the text pointer is actually treated as an LPCTSTR but when you are reading from the control you need a writeable string. The structure cannot distinguish its use for input from its use for output.

Therefore, you will often find in my code something that looks like

Collapse Copy Code
tvi.item.pszText = (LPTSTR)(LPCTSTR)s;

This casts the CString to an LPCTSTR, thus giving me that address of the string, which I then force to be an LPTSTR so I can assign it. Note that this is valid only if you are using the value as data to a Set or Insert style method! You cannot do this when you are trying to retrieve data!

You need a slightly different method when you are trying to retrieve data, such as the value stored in a control. For example, for a CTreeCtrl using the GetItem method. Here, I want to get the text of the item. I know that the text is no more than MY_LIMIT in size. Therefore, I can write something like

Collapse Copy Code
// ... assorted initialization of other fields of tvi

tvi.pszText = s.GetBuffer(MY_LIMIT);
tvi.cchTextMax = MY_LIMIT;

Note that the code above works for any type of Set method also, but is not needed because for a Set-type method (including Insert) you are not writing the string. But when you are writing the CString you need to make sure the buffer is writeable. That's what the GetBuffer does. Again, note that once you have done the GetBuffer call, you must not do anything else to the CString until the ReleaseBuffer call.

CString to BSTR

When programming with ActiveX, you will sometimes need a value represented as a type BSTR. A BSTR is a counted string, a wide-character (Unicode) string on Intel platforms and can contain embedded NUL characters.

You can convert at CString to a BSTR by calling the CString method AllocSysString:

Collapse Copy Code
CString s;
s = ... ; // whatever

BSTR b = s.AllocSysString()

The pointer b points to a newly-allocated BSTR object which is a copy of the CString, including the terminal NUL character. This may now be passed to whatever interface you are calling that requires a BSTR. Normally, a BSTR is disposed of by the component receiving it. If you should need to dispose of a BSTR, you must use the call

Collapse Copy Code

to free the string.

The story is that the decision of how to represent strings sent to ActiveX controls resulted in some serious turf wars within Microsoft. The Visual Basic people won, and the string type BSTR (acronym for "Basic String") was the result.

BSTR to CString

Since a BSTR is a counted Unicode string, you can use standard conversions to make an 8-bit CString. Actually, this is built-in; there are special constructors for converting ANSI strings to Unicode and vice-versa. You can also get BSTRs as results in a "code-string" href="%3Cspan">"#VARIANT to CString">VARIANT type, which is a type returned by various COM and Automation calls.

For example, if you do, in an ANSI application,

Collapse Copy Code
b = ...; // whatever

CString s(b == NULL ? L"" : s)

works just fine for a single-string BSTR, because there is a special constructor that takes an LPCWSTR (which is what a BSTR is) and converts it to an ANSI string. The special test is required because a BSTR could be NULL, and the constructors Don't Play Well with NULL inputs (thanks to Brian Ross for pointing this out!). This also only works for a BSTR that contains only a single string terminated with a NUL; you have to do more work to convert strings that contain multiple NUL characters. Note that embedded NUL characters generally don't work well in CStrings and generally should be avoided.

Remember, according to the rules of C/C++, if you have an LPWSTR it will match a parameter type of LPCWSTR (it doesn't work the other way!).

In UNICODE mode, this is just the constructor

Collapse Copy Code

As indicated above, in ANSI mode there is a special constructor for

Collapse Copy Code

this calls an internal function to convert the Unicode string to an ANSI string. (In Unicode mode there is a special constructor that takes an LPCSTR, a pointer to an 8-bit ANSI string, and widens it to a Unicode string!). Again, note the limitation imposed by the need to test for a BSTR value which is NULL.

There is an additional problem as pointed out above: BSTRs can contain embedded NUL characters; CString constructors can only handle single NUL characters in a string. This means that CStrings will compute the wrong length for a string which contains embedded NUL bytes. You need to handle this yourself. If you look at the constructors in strcore.cpp, you will see that they all do an lstrlen or equivalent to compute the length.

Note that the conversion from Unicode to ANSI uses the ::WideCharToMultiByte conversion with specific arguments that you may not like. If you want a different conversion than the default, you have to write your own.

If you are compiling as UNICODE, then it is a simple assignment:

Collapse Copy Code
CString convert(BSTR b)
    if(b == NULL)
        return CString(_T(""));
    CString s(b); // in UNICODE mode

    return s;

If you are in ANSI mode, you need to convert the string in a more complex fashion. This will accomplish it. Note that this code uses the same argument values to ::WideCharToMultiByte that the implicit constructor for CString uses, so you would use this technique only if you wanted to change these parameters to do the conversion in some other fashion, for example, specifying a different default character, a different set of flags, etc.

Collapse Copy Code
CString convert(BSTR b)
    CString s;
    if(b == NULL)
       return s; // empty for NULL BSTR

#ifdef UNICODE
    s = b;
    LPSTR p = s.GetBuffer(SysStringLen(b) + 1); 
    ::WideCharToMultiByte(CP_ACP,            // ANSI Code Page

                          0,                 // no flags

                          b,                 // source widechar string

                          -1,                // assume NUL-terminated

                          p,                 // target buffer

                          SysStringLen(b)+1, // target buffer length

                          NULL,              // use system default char

                          NULL);             // don't care if default used

    return s;

Note that I do not worry about what happens if the BSTR contains Unicode characters that do not map to the 8-bit character set, because I specify NULL as the last two parameters. This is the sort of thing you might want to change.

VARIANT to CString

Actually, I've never done this; I don't work in COM/OLE/ActiveX where this is an issue. But I saw a posting by Robert Quirk on the microsoft.public.vc.mfc newsgroup on how to do this, and it seemed silly not to include it in this essay, so here it is, with a bit more explanation and elaboration. Any errors relative to what he wrote are my fault.

A VARIANT is a generic parameter/return type in COM programming. You can write methods that return a type VARIANT, and which type the function returns may (and often does) depend on the input parameters to your method (for example, in Automation, depending on which method you call, IDispatch::Invoke may return (via one of its parameters) a VARIANT which holds a BYTE, a WORD, an float, a double, a date, a BSTR, and about three dozen other types (see the specifications of the VARIANT structure in the MSDN). In the example below, it is assumed that the type is known to be a variant of type BSTR, which means that the value is found in the string referenced by bstrVal. This takes advantage of the fact that there is a constructor which, in an ANSI application, will convert a value referenced by an LPCWCHAR to a CString (see "code-string" href="%3Cspan">"#BSTR to CString">BSTR-to-CString). In Unicode mode, this turns out to be the normal CString constructor. See the caveats about the default ::WideCharToMultibyte conversion and whether or not you find these acceptable (mostly, you will).

Collapse Copy Code

vaData = m_com.YourMethodHere();
ASSERT(vaData.vt == VT_BSTR);

CString strData(vaData.bstrVal);

Note that you could also make a more generic conversion routine that looked at the vt field. In this case, you might consider something like:

Collapse Copy Code
CString VariantToString(VARIANT * va)
    CString s;
      { /* vt */
       case VT_BSTR:
          return CString(vaData->bstrVal);
       case VT_BSTR | VT_BYREF:
          return CString(*vaData->pbstrVal);
       case VT_I4:
          s.Format(_T("%d"), va->lVal);
          return s;
       case VT_I4 | VT_BYREF:
          s.Format(_T("%d"), *va->plVal);
       case VT_R8:
          s.Format(_T("%f"), va->dblVal);
          return s;
       ... remaining cases left as an Exercise For The Reader
          ASSERT(FALSE); // unknown VARIANT type (this ASSERT is optional)

          return CString("");
      } /* vt */

Loading STRINGTABLE values

If you want to create a program that is easily ported to other languages, you must not include native-language strings in your source code. (For these examples, I'll use English, since that is my native language (aber Ich kann ein bischen Deutsch sprechen). So it is verybad practice to write

Collapse Copy Code
CString s = "There is an error";

Instead, you should put all your language-specific strings (except, perhaps, debug strings, which are never in a product deliverable). This means that is fine to write

Collapse Copy Code
s.Format(_T("%d - %s"), code, text);

in your program; that literal string is not language-sensitive. However, you must be very careful to not use strings like

Collapse Copy Code
// fmt is "Error in %s file %s"

// readorwrite is "reading" or "writing"

s.Format(fmt, readorwrite, filename); 

I speak of this from experience. In my first internationalized application I made this error, and in spite of the fact that I know German, and that German word order places the verb at the end of a sentence, I had done this. Our German distributor complained bitterly that he had to come up with truly weird error messages in German to get the format codes to do the right thing. It is much better (and what I do now) to have two strings, one for reading and one for writing, and load the appropriate one, making them string parameter-insensitive, that is, instead of loading the strings "reading" or "writing", load the whole format:

Collapse Copy Code
// fmt is "Error in reading file %s"

//          "Error in writing file %s"

s.Format(fmt, filename);

Note that if you have more than one substitution, you should make sure that if the word order of the substitutions does not matter, for example, subject-object, subject-verb, or verb-object, in English.

For now, I won't talk about FormatMessage, which actually is better than sprintf/Format, but is poorly integrated into the CString class. It solves this by naming the parameters by their position in the parameter list and allows you to rearrange them in the output string.

So how do we accomplish all this? By storing the string values in the resource known as the STRINGTABLE in the resource segment. To do this, you must first create the string, using the Visual Studio resource editor. A string is given a string ID, typically starting IDS_. So you have a message, you create the string and call it IDS_READING_FILE and another called IDS_WRITING_FILE. They appear in your .rc file as

Collapse Copy Code
    IDS_READING_FILE "Reading file %s"
    IDS_WRITING_FILE "Writing file %s"

Note: these resources are always stored as Unicode strings, no matter what your program is compiled as. They are even Unicode strings on Win9x platforms, which otherwise have no real grasp of Unicode (but they do for resources!). Then you go to where you had stored the strings

Collapse Copy Code
// previous code

   CString fmt;
        fmt = "Reading file %s";
       fmt = "Writing file %s";
    // much later

  CString s;
  s.Format(fmt, filename);

and instead do

Collapse Copy Code
// revised code

    CString fmt;
      // much later

    CString s;
    s.Format(fmt, filename);

Now your code can be moved to any language. The LoadString method takes a string ID and retrieves the STRINGTABLE value it represents, and assigns that value to the CString.

There is a clever feature of the CString constructor that simplifies the use of STRINGTABLE entries. It is not explicitly documented in the CString::CString specification, but is obscurely shown in the example usage of the constructor! (Why this couldn't be part of the formal documentation and has to be shown in an example escapes me!). The feature is that if you cast a STRINGTABLE ID to an LPCTSTR it will implicitly do a LoadString. Thus the following two examples of creating a string value produce the same effect, and the ASSERT will not trigger in debug mode compilations:

Collapse Copy Code
CString s;
ASSERT(s == t);

Now, you may say, how can this possibly work? How can it tell a valid pointer from a STRINGTABLE ID? Simple: all string IDs are in the range 1..65535. This means that the high-order bits of the pointer will be 0. Sounds good, but what if I have valid data in a low address? Well, the answer is, you can't. The lower 64K of your address space will never, ever, exist. Any attempt to access a value in the address range 0x00000000 through 0x0000FFFF (0..65535) will always and forever give an access fault. These addresses are never, ever valid addresses. Thus a value in that range (other than 0) must necessarily represent a STRINGTABLE ID.

I tend to use the MAKEINTRESOURCE macro to do the casting. I think it makes the code clearer regarding what is going on. It is a standard macro which doesn't have much applicability otherwise in MFC. You may have noted that many methods take either a UINT or an LPCTSTR as parameters, using C++ overloading. This gets us around the ugliness of pure C where the "overloaded" methods (which aren't really overloaded in C) required explicit casts. This is also useful in assigning resource names to various other structures.

Collapse Copy Code
CString s;
ASSERT(s == t);

Just to give you an idea: I practice what I preach here. You will rarely if ever find a literal string in my program, other than the occasional debug output messages, and, of course, any language-independent string.

CStrings and temporary objects

Here's a little problem that came up on the microsoft.public.vc.mfc newsgroup a while ago. I'll simplify it a bit. The basic problem was the programmer wanted to write a string to the Registry. So he wrote:

I am trying to set a registry value using RegSetValueEx() and it is the value that I am having trouble with. If I declare a variable of char[] it works fine. However, I am trying to convert from a CString and I get garbage. "����...������" to be exact. I have tried GetBuffer, typecasting to char*, LPCSTR. The return of GetBuffer (from debug) is the correct string but when I assign it to a char* (or LPCSTR) it is garbage. Following is a piece of my code:

Collapse Copy Code
char* szName = GetName().GetBuffer(20);
RegSetValueEx(hKey, "Name", 0, REG_SZ, 
                    (CONST BYTE *) szName,
                    strlen (szName + 1));

The Name string is less then 20 chars long, so I don't think the GetBuffer parameter is to blame. It is very frustrating and any help is appreciated.

Dear Frustrated,

You have been done in by a fairly subtle error, caused by trying to be a bit too clever. What happened was that you fell victim to knowing too much. The correct code is shown below:

Collapse Copy Code
CString Name = GetName();
RegSetValueEx(hKey, _T("Name"), 0, REG_SZ, 
                    (CONST BYTE *) (LPCTSTR)Name,
                    (Name.GetLength() + 1) * sizeof(TCHAR));

Here's why my code works and yours didn't. When your function GetName returned a CString, it returned a "temporary object". See the C++ Reference manual �12.2.

In some circumstances it may be necessary or convenient for the compiler to generate a temporary object. Such introduction of temporaries is implementation dependent. When a compiler introduces a temporary object of a class that has a constructor it must ensure that a construct is called for the temporary object. Similarly, the destructor must be called for a temporary object of a class where a destructor is declared.

The compiler must ensure that a temporary object is destroyed. The exact point of destruction is implementation dependent....This destruction must take place before exit from the scope in which the temporary is created.

Most compilers implement the implicit destructor for a temporary at the next program sequencing point following its creation, that is, for all practical purposes, the next semicolon. Hence the CString existed when the GetBuffer call was made, but was destroyed following the semicolon. (As an aside, there was no reason to provide an argument to GetBuffer, and the code as written is incorrect since there is no ReleaseBuffer performed). So what GetBuffer returned was a pointer to storage for the text of the CString. When the destructor was called at the semicolon, the basic CString object was freed, along with the storage that had been allocated to it. The MFC debug storage allocator then rewrites this freed storage with 0xDD, which is the symbol "�". By the time you do the write to the Registry, the string contents have been destroyed.

There is no particular reason to need to cast the result to a char * immediately. Storing it as a CString means that a copy of the result is made, so after the temporary CString is destroyed, the string still exists in the variable's CString. The casting at the time of the Registry call is sufficient to get the value of a string which already exists.

In addition, my code is Unicode-ready. The Registry call wants a byte count. Note also that the call lstrlen(Name+1) returns a value that is too small by 2 for an ANSI string, since it doesn't start until the second character of the string. What you meant to write was lstrlen(Name) + 1 (OK, I admit it, I've made the same error!). However, in Unicode, where all characters are two bytes long, we need to cope with this. The Microsoft documentation is surprisingly silent on this point: is the value given for REG_SZ values a byte count or a character count? I'm assuming that their specification of "byte count" means exactly that, and you have to compensate.

CString Efficiency

One problem of CString is that it hides certain inefficiencies from you. On the other hand, it also means that it can implement certain efficiencies. You may be tempted to say of the following code

Collapse Copy Code
CString s = SomeCString1;
s += SomeCString2;
s += SomeCString3;
s += ",";
s += SomeCString4;

that it is horribly inefficient compared to, say

Collapse Copy Code
char s[1024];
lstrcpy(s, SomeString1);
lstrcat(s, SomeString2);
lstrcat(s, SomeString 3);
lstrcat(s, ",");
lstrcat(s, SomeString4);

After all, you might think, first it allocates a buffer to hold SomeCString1, then copies SomeCString1 to it, then detects it is doing a concatenate, allocates a new buffer large enough to hold the current string plus SomeCString2, copies the contents to the buffer and concatenates the SomeCString2 to it, then discards the first buffer and replaces the pointer with a pointer to the new buffer, then repeats this for each of the strings, being horribly inefficient with all those copies.

The truth is, it probably never copies the source strings (the left side of the +=) for most cases.

In VC++ 6.0, in Release mode, all CString buffers are allocated in predefined quanta. These are defined as 64, 128, 256, and 512 bytes. This means that unless the strings are very long, the creation of the concatenated string is an optimized version of a strcat operation (since it knows the location of the end of the string it doesn't have to search for it, as strcat would; it just does a memcpy to the correct place) plus a recomputation of the length of the string. So it is about as efficient as the clumsier pure-C code, and one whole lot easier to write. And maintain. And understand.

Those of you who aren't sure this is what is really happening, look in the source code for CString, strcore.cpp, in the mfc/src subdirectory of your vc98 installation. Look for the method ConcatInPlace which is called from all the += operators.

Aha! So CString isn't really "efficient!" For example, if I create

Collapse Copy Code
CString cat("Mew!");

then I don't get a nice, tidy little buffer 5 bytes long (4 data bytes plus the terminal NUL). Instead the system wastes all that space by giving me 64 bytes and wasting 59 of them.

If this is how you think, be prepared to reeducate yourself. Somewhere in your career somebody taught you that you always had to use as little space as possible, and this was a Good Thing.

This is incorrect. It ignores some seriously important aspects of reality.

If you are used to programming embedded applications with 16K EPROMs, you have a particular mindset for doing such allocation. For that application domain, this is healthy. But for writing Windows applications on 500MHz, 256MB machines, it actually works against you, and creates programs that perform far worse than what you would think of as "less efficient" code.

For example, size of strings is thought to be a first-order effect. It is Good to make this small, and Bad to make it large. Nonsense. The effect of precise allocation is that after a few hours of the program running, the heap is cluttered up with little tiny pieces of storage which are useless for anything, but they increase the storage footprint of your application, increase paging traffic, can actually slow down the storage allocator to unacceptable performance levels, and eventually allow your application to grow to consume all of available memory. Storage fragmentation, a second-order or third-order effect, actually dominates system performance. Eventually, it compromises reliability, which is completely unacceptable.

Note that in Debug mode compilations, the allocation is always exact. This helps shake out bugs.

Assume your application is going to run for months at a time. For example, I bring up VC++, Word, PowerPoint, FrontPage, Outlook Express, Fort� Agent, Internet Explorer, and a few other applications, and essentially never close them. I've edited using PowerPoint for days on end (on the other hand, if you've had the misfortune to have to use something like Adobe FrameMaker, you begin to appreciate reliability; I've rarely been able to use this application without it crashing four to six times a day! And always because it has run out of space, usually by filling up my entire massive swap space!) Precise allocation is one of the misfeatures that will compromise reliability and lead to application crashes.

By making CStrings be multiples of some quantum, the memory allocator will end up cluttered with chunks of memory which are almost always immediately reusable for another CString, so the fragmentation is minimized, allocator performance is enhanced, application footprint remains almost as small as possible, and you can run for weeks or months without problem.

Aside: Many years ago, at CMU, we were writing an interactive system. Some studies of the storage allocator showed that it had a tendency to fragment memory badly. Jim Mitchell, now at Sun Microsystems, created a storage allocator that maintained running statistics about allocation size, such as the mean and standard deviation of all allocations. If a chunk of storage would be split into a size that was smaller than the mean minus one s than the prevailing allocation, he didn't split it at all, thus avoiding cluttering up the allocator with pieces too small to be usable. He actually used floating point inside an allocator! His observation was that the long-term saving in instructions by not having to ignore unusable small storage chunks far and away exceeded the additional cost of doing a few floating point operations on an allocation operation. He was right.

Never, ever think about "optimization" in terms of small-and-fast analyzed on a per-line-of-code basis. Optimization should mean small-and-fast analyzed at the complete application level (if you like New Age buzzwords, think of this as the holistic approach to program optimization, a whole lot better than the per-line basis we teach new programmers). At the complete application level, minimum-chunk string allocation is about the worst method you could possibly use.

If you think optimization is something you do at the code-line level, think again. Optimization at this level rarely matters. Read my essay on Optimization: Your Worst Enemy for some thought-provoking ideas on this topic.

Note that the += operator is special-cased; if you were to write:

Collapse Copy Cod
CString s = SomeCString1 + SomeCString2 + SomeCString3 + "," + SomeCString4;

then each application of the + operator causes a new string to be created and a copy to be done (although it is an optimized version, since the length of the string is known and the inefficiencies of strcat do not come into play).


These are just some of the techniques for using CString. I use these every day in my programming. CString is not a terribly difficult class to deal with, but generally the MFC materials do not make all of this apparent, leaving you to figure it out on your own



CString 操作指南

原著:Joseph M. Newcomer


通过阅读本文你可以学习如何有效地使用 CString。

  CString 是一种很有用的数据类型。它们很大程度上简化了MFC中的许多操作,使得MFC在做字符串操作的时候方便了很多。不管怎样,使用CString有很多特殊的技巧,特别是对于纯C背景下走出来的程序员来说有点难以学习。这篇文章就来讨论这些技巧。


  1. CString 对象的连接

  2. 格式化字符串(包括 int 型转化为 CString )
  3. CString 型转化成 int 型
  4. CString 型和 char* 类型的相互转化
  5. char* 转化成 CString
  6. CString 转化成 char* 之一:使用LPCTSTR强制转化
  7. CString 转化成 char* 之二:使用CString对象的GetBuffer方法
  8. CString 转化成 char* 之三: 和控件的接口
  9. CString 型转化成 BSTR 型
  10. BSTR 型转化成 CString 型
  11. VARIANT 型转化成 CString 型
  12. 载入字符串表资源
  13. CString 和临时对象
  14. CString 的效率
  15. 总结


1、CString 对象的连接

  能体现出 CString 类型方便性特点的一个方面就字符串的连接,使用 CString 类型,你能很方便地连接两个字符串,正如下面的例子:

CString gray("Gray");
CString cat("Cat");
CString graycat = gray + cat;


char gray[] = "Gray";
char cat[] = "Cat";
char * graycat = malloc(strlen(gray) + strlen(cat) + 1);
strcpy(graycat, gray);
strcat(graycat, cat);


  与其用 sprintf() 函数或 wsprintf() 函数来格式化一个字符串,还不如用 CString 对象的Format()方法:

CString s;
s.Format(_T("The total is %d"), total);


CString s;
s.Format(_T("%d"), total);


#define _T(x) x // 非Unicode版本(non-Unicode version)


#define _T(x) L##x // Unicode版本(Unicode version)


s.Format(L"%d", total);

  如果你认为你的程序可能在Unicode的环境下运行,那么开始在意用 Unicode 编码。比如说,不要用 sizeof() 操作符来获得字符串的长度,因为在Unicode环境下就会有2倍的误差。我们可以用一些方法来隐藏Unicode的一些细节,比如在我需要获得字符长度的时候,我会用一个叫做DIM的宏,这个宏是在我的dim.h文件中定义的,我会在我写的所有程序中都包含这个文件:

#define DIM(x) ( sizeof((x)) / sizeof((x)[0]) )


class Whatever { ... };
Whatever data[] = {
   { ... },
   { ... },
for(int i = 0; i < DIM(data); i++) // 扫描表格寻找匹配项。


TCHAR data[20];
lstrcpyn(data, longstring, sizeof(data) - 1); // WRONG!
lstrcpyn(data, longstring, DIM(data) - 1); // RIGHT
WriteFile(f, data, DIM(data), &bytesWritten, NULL); // WRONG!
WriteFile(f, data, sizeof(data), &bytesWritten, NULL); // RIGHT


WriteFile(f, data, lstrlen(data), &bytesWritten, NULL); // WRONG


WriteFile(f, data, lstrlen(data) * sizeof(TCHAR), &bytesWritten, NULL); // RIGHT

  使用_T宏并不是意味着你已经创建了一个Unicode的程序,你只是创建了一个有Unicode意识的程序而已。如果你在默认的8-bit模式下编译你的程序的话,得到的将是一个普通的8-bit的应用程序(这里的8-bit指的只是8位的字符编码,并不是指8位的计算机系统);当你在Unicode环境下编译你的程序时,你才会得到一个Unicode的程序。记住,CString 在 Unicode 环境下,里面包含的可都是16位的字符哦。

 3、CString 型转化成 int 型

  把 CString 类型的数据转化成整数类型最简单的方法就是使用标准的字符串到整数转换例程。
  虽然通常你怀疑使用_atoi()函数是一个好的选择,它也很少会是一个正确的选择。如果你准备使用 Unicode 字符,你应该用_ttoi(),它在 ANSI 编码系统中被编译成_atoi(),而在 Unicode 编码系统中编译成_wtoi()。你也可以考虑使用_tcstoul()或者_tcstol(),它们都能把字符串转化成任意进制的长整数(如二进制、八进制、十进制或十六进制),不同点在于前者转化后的数据是无符号的(unsigned),而后者相反。看下面的例子:

CString hex = _T("FAB");
CString decimal = _T("4011");
ASSERT(_tcstoul(hex, 0, 16) == _ttoi(decimal));

4、CString 型和 char* 类型的相互转化

  这是初学者使用 CString 时最常见的问题。有了 C++ 的帮助,很多问题你不需要深入的去考虑它,直接拿来用就行了,但是如果你不能深入了解它的运行机制,又会有很多问题让你迷惑,特别是有些看起来没有问题的代码,却偏偏不能正常工作。

CString graycat = "Gray" + "Cat";


CString graycat("Gray" + "Cat");

  事实上,编译器将抱怨上面的这些尝试。为什么呢?因为针对CString 和 LPCTSTR数据类型的各种各样的组合,“ +” 运算符 被定义成一个重载操作符。而不是两个 LPCTSTR 数据类型,它是底层数据类型。你不能对基本数据(如 int、char 或者 char*)类型重载 C++ 的运算符。你可以象下面这样做:

CString graycat = CString("Gray") + CString("Cat");


CString graycat = CString("Gray") + "Cat";

研究一番就会发现:“ +”总是使用在至少有一个 CString 对象和一个 LPCSTR 的场合。

注意,编写有 Unicode 意识的代码总是一件好事,比如:

CString graycat = CString(_T("Gray")) + _T("Cat");


char* 转化为 CString

  现在你有一个 char* 类型的数据,或者说一个字符串。怎么样创建 CString 对象呢?这里有一些例子:

char * p = "This is a test";

或者象下面这样更具有 Unicode 意识:

TCHAR * p = _T("This is a test")

LPTSTR p = _T("This is a test");


CString s = "This is a test"; // 8-bit only
CString s = _T("This is a test"); // Unicode-aware
CString s("This is a test"); // 8-bit only
CString s(_T("This is a test")); // Unicode-aware
CString s = p;
CString s(p);

  用这些方法可以轻松将常量字符串或指针转换成 CString。需要注意的是,字符的赋值总是被拷贝到 CString 对象中去的,所以你可以象下面这样操作:

TCHAR * p = _T("Gray");
CString s(p);
p = _T("Cat");
s += p;


CString 类还有几个其它的构造函数,但是这里我们不考虑它,如果你有兴趣可以自己查看相关文档。

事实上,CString 类的构造函数比我展示的要复杂,比如:

CString s = "This is a test"; 

  这是很草率的编码,但是实际上它在 Unicode 环境下能编译通过。它在运行时调用构造函数的 MultiByteToWideChar 操作将 8 位字符串转换成 16 位字符串。不管怎样,如果 char * 指针是网络上传输的 8 位数据,这种转换是很有用的。

CString 转化成 char* 之一强制类型转换为 LPCTSTR;

  我们首先要了解 CString 是一种很特殊的 C++ 对象,它里面包含了三个值:一个指向某个数据缓冲区的指针、一个是该缓冲中有效的字符记数以及一个缓冲区长度。 有效字符数的大小可以是从0到该缓冲最大长度值减1之间的任何数(因为字符串结尾有一个NULL字符)。字符记数和缓冲区长度被巧妙隐藏。
  除非你做一些特殊的操作,否则你不可能知道给CString对象分配的缓冲区的长度。这样,即使你获得了该0缓冲的地址,你也无法更改其中的内容,不能截短字符串,也 绝对没有办法加长它的内容,否则第一时间就会看到溢出。
  LPCTSTR 操作符(或者更明确地说就是 TCHAR * 操作符)在 CString 类中被重载了,该操作符的定义是返回缓冲区的地址,因此,如果你需要一个指向 CString 的 字符串指针的话,可以这样做:


CString s("GrayCat");
LPCTSTR p = s;

  它可以正确地运行。这是由C语言的强制类型转化规则实现的。当需要强制类型转化时,C++规测容许这种选择。比如,你可以将(浮点数)定义为将某个复数 (有一对浮点数)进行强制类型转换后只返回该复数的第一个浮点数(也就是其实部)。可以象下面这样:

Complex c(1.2f, 4.8f);
float realpart = c;

  这种强制转化适合所有这种情况,例如,任何带有 LPCTSTR 类型参数的函数都会强制执行这种转换。 于是,你可能有这样一个函数(也许在某个你买来的DLL中):

BOOL DoSomethingCool(LPCTSTR s);


CString file("c://myfiles//coolstuff")
BOOL result = DoSomethingCool(file);

  它能正确运行。因为 DoSomethingCool 函数已经说明了需要一个 LPCTSTR 类型的参数,因此 LPCTSTR 被应用于该参数,在 MFC 中就是返回的串地址。


CString graycat("GrayCat");
CString s;
s.Format("Mew! I love %s", graycat);


"Mew! I love GrayCat"。

  因为 MFC 的设计者们在设计 CString 数据类型时非常小心, CString 类型表达式求值后指向了字符串,所以这里看不到任何象 Format 或 sprintf 中的强制类型转换,你仍然可以得到正确的行为。描述 CString 的附加数据实际上在 CString 名义地址之后。
  有一件事情你是不能做的,那就是修改字符串。比如,你可能会尝试用“,”代替“.”(不要做这样的,如果你在乎国际化问题,你应该使用十进制转换的 National Language Support 特性,),下面是个简单的例子:

CString v("1.00"); // 货币金额,两位小数
LPCTSTR p = v;
p[lstrlen(p) - 3] = '','';


strcat(p, "each");

  因为 strcat 的第一个参数应该是 LPTSTR 类型的数据,而你却给了一个 LPCTSTR。


  原因是缓冲有一个计数,它是不可存取的(它位于 CString 地址之下的一个隐藏区域),如果你改变这个串,缓冲中的字符计数不会反映所做的修改。此外,如果字符串长度恰好是该字符串物理限制的长度(梢后还会讲到这个问题),那么扩展该字符串将改写缓冲以外的任何数据,那是你无权进行写操作的内存(不对吗?),你会毁换坏不属于你的内存。这是应用程序真正的死亡处方。

CString转化成char* 之二使用 CString 对象的 GetBuffer 方法;

  如果你需要修改 CString 中的内容,它有一个特殊的方法可以使用,那就是 GetBuffer,它的作用是返回一个可写的缓冲指针。 如果你只是打算修改字符或者截短字符串,你完全可以这样做:

CString s(_T("File.ext"));
LPTSTR p = s.GetBuffer();
LPTSTR dot = strchr(p, ''.''); // OK, should have used s.Find...
if(p != NULL)
*p = _T(''/0'');

  这是 GetBuffer 的第一种用法,也是最简单的一种,不用给它传递参数,它使用默认值 0,意思是:“给我这个字符串的指针,我保证不加长它”。当你调用 ReleaseBuffer 时,字符串的实际长度会被重新计算,然后存入 CString 对象中。
  必须强调一点,在 GetBuffer 和 ReleaseBuffer 之间这个范围,一定不能使用你要操作的这个缓冲的 CString 对象的任何方法。因为 ReleaseBuffer 被调用之前,该 CString 对象的完整性得不到保障。研究以下代码:

CString s(...);

LPTSTR p = s.GetBuffer();

//... 这个指针 p 发生了很多事情

int n = s.GetLength(); // 很糟D!!!!! 有可能给出错误的答案!!!

s.TrimRight(); // 很糟!!!!! 不能保证能正常工作!!!!

s.ReleaseBuffer(); // 现在应该 OK

int m = s.GetLength(); // 这个结果可以保证是正确的。

s.TrimRight(); // 将正常工作。


char buffer[1024];

表示 1024 个字符空间足以让你做任何想做得事情。在 CString 中与之意义相等的表示法:

LPTSTR p = s.GetBuffer(1024);

  调用这个函数后,你不仅获得了字符串缓冲区的指针,而且同时还获得了长度至少为 1024 个字符的空间(注意,我说的是“字符”,而不是“字节”,因为 CString 是以隐含方式感知 Unicode 的)。
  同时,还应该注意的是,如果你有一个常量串指针,这个串本身的值被存储在只读内存中,如果试图存储它,即使你已经调用了 GetBuffer ,并获得一个只读内存的指针,存入操作会失败,并报告存取错误。我没有在 CString 上证明这一点,但我看到过大把的 C 程序员经常犯这个错误。
  C 程序员有一个通病是分配一个固定长度的缓冲,对它进行 sprintf 操作,然后将它赋值给一个 CString:

char buffer[256];
sprintf(buffer, "%......", args, ...); // ... 部分省略许多细节
CString s = buffer;


CString s;
s.Format(_T("%...."), args, ...);

如果你的字符串长度万一超过 256 个字符的时候,不会破坏堆栈。


int len = lstrlen(parm1) + 13  lstrlen(parm2) + 10 + 100;

char * buffer = new char[len];

sprintf(buffer, "%s is equal to %s, valid data", parm1, parm2);

CString s = buffer;


delete [] buffer;


CString s;

s.Format(_T("%s is equal to %s, valid data"), parm1, parm2);

  需要注意 sprintf 例子都不是 Unicode 就绪的,尽管你可以使用 tsprintf 以及用 _T() 来包围格式化字符串,但是基本 思路仍然是在走弯路,这这样很容易出错。

CString to char * 之三和控件的接口;

  我们经常需要把一个 CString 的值传递给一个控件,比如,CTreeCtrl。MFC为我们提供了很多便利来重载这个操作,但是 在大多数情况下,你使用“原始”形式的更新,因此需要将墨某个串指针存储到 TVINSERTITEMSTRUCT 结构的 TVITEM 成员中。如下:

CString s;
// ... 为s赋一些值。
tvi.item.pszText = s; // Compiler yells at you here
// ... 填写tvi的其他域
HTREEITEM ti = c_MyTree.InsertItem(&tvi);

  为什么编译器会报错呢?明明看起来很完美的用法啊!但是事实上如果你看看 TVITEM 结构的定义你就会明白,在 TVITEM 结构中 pszText 成员的声明如下:

LPTSTR pszText;
int cchTextMax;

  因此,赋值不是赋给一个 LPCTSTR 类型的变量,而且编译器无法知道如何将赋值语句右边强制转换成 LPCTSTR。好吧,你说,那我就改成这样:

tvi.item.pszText = (LPCTSTR)s; //编译器依然会报错。

  编译器之所以依然报错是因为你试图把一个 LPCTSTR 类型的变量赋值给一个 LPTSTR 类型的变量,这种操作在C或C++中是被禁止的。你不能用这种方法 来滥用常量指针与非常量指针概念,否则,会扰乱编译器的优化机制,使之不知如何优化你的程序。比如,如果你这么做:

const int i = ...;
//... do lots of stuff
... = a[i]; // usage 1
// ... lots more stuff
... = a[i]; // usage 2

  那么,编译器会以为既然 i 是 const ,所以 usage1和usage2的值是相同的,并且它甚至能事先计算好 usage1 处的 a[i] 的地址,然后保留着在后面的 usage2 处使用,而不是重新计算。如果你按如下方式写的话:

const int i = ...;
int * p = &i;
//... do lots of stuff
... = a[i]; // usage 1
// ... lots more stuff
(*p)++; // mess over compiler''s assumption
// ... and other stuff
... = a[i]; // usage 2

  编译器将认为 i 是常量,从而 a[i] 的位置也是常量,这样间接地破坏了先前的假设。因此,你的程序将会在 debug 编译模式(没有优化)和 release 编译模式(完全优化)中反映出不同的行为,这种情况可不好,所以当你试图把指向 i 的指针赋值给一个 可修改的引用时,会被编译器诊断为这是一种伪造。这就是为什么(LPCTSTR)强制类型转化不起作用的原因。
  为什么不把该成员声明成 LPCTSTR 类型呢?因为这个结构被用于读写控件。当你向控件写数据时,文本指针实际上被当成 LPCTSTR,而当你从控件读数据 时,你必须有一个可写的字符串。这个结构无法区分它是用来读还是用来写。


tvi.item.pszText = (LPTSTR)(LPCTSTR)s;

  它把 CString 强制类型转化成 LPCTSTR,也就是说先获得改字符串的地址,然后再强制类型转化成 LPTSTR,以便可以对之进行赋值操作。 注意这只有在使用 Set 或 Insert 之类的方法才有效!如果你试图获取数据,则不能这么做。
  如果你打算获取存储在控件中的数据,则方法稍有不同,例如,对某个 CTreeCtrl 使用 GetItem 方法,我想获取项目的文本。我知道这些 文本的长度不会超过 MY_LIMIT,因此我可以这样写:

// ... assorted initialization of other fields of tvi
tvi.pszText = s.GetBuffer(MY_LIMIT);
tvi.cchTextMax = MY_LIMIT;

  可以看出来,其实上面的代码对所有类型的 Set 方法都适用,但是并不需要这么做,因为所有的类 Set 方法(包括 Insert方法)不会改变字符串的内容。但是当你需要写 CString 对象时,必须保证缓冲是可写的,这正是 GetBuffer 所做的事情。再次强调: 一旦做了一次 GetBuffer 调用,那么在调用 ReleaseBuffer 之前不要对这个 CString 对象做任何操作。

5、CString 型转化成 BSTR 型

  当我们使用 ActiveX 控件编程时,经常需要用到将某个值表示成 BSTR 类型。BSTR 是一种记数字符串,Intel平台上的宽字符串(Unicode),并且 可以包含嵌入的 NULL 字符。

你可以调用 CString 对象的 AllocSysString 方法将 CString 转化成 BSTR:

CString s;
s = ... ; // whatever
BSTR b = s.AllocSysString();

  现在指针 b 指向的就是一个新分配的 BSTR 对象,该对象是 CString 的一个拷贝,包含终结 NULL字符。现在你可以将它传递给任何需要 BSTR 的接口。通常,BSTR 由接收它的组件来释放,如果你需要自己释放 BSTR 的话,可以这么做:


  对于如何表示传递给 ActiveX 控件的字符串,在微软内部曾一度争论不休,最后 Visual Basic 的人占了上风,BSTR(“Basic String”的首字母缩写)就是这场争论的结果。

6、BSTR 型转化成 CString 型

  由于 BSTR 是记数 Unicode 字符串,你可以用标准转换方法来创建 8 位的 CString。实际上,这是 CString 内建的功能。在 CString 中 有特殊的构造函数可以把 ANSI 转化成 Unicode,也可以把Unicode 转化成 ANSI。你同样可以从 VARIANT 类型的变量中获得 BSTR 类型的字符串,VARIANT 类型是 由各种 COM 和 Automation (自动化)调用返回的类型。


b = ...; // whatever
CString s(b == NULL ? L"" : b)

  对于单个的 BSTR 串来说,这种用法可以工作得很好,这是因为 CString 有一个特殊的构造函数以LPCWSTR(BSTR正是这种类型) 为参数,并将它转化成 ANSI 类型。专门检查是必须的,因为 BSTR 可能为空值,而 CString 的构造函数对于 NULL 值情况考虑的不是很周到,(感谢 Brian Ross 指出这一点!)。这种用法也只能处理包含 NUL 终结字符的单字符串;如果要转化含有多个 NULL 字符 串,你得额外做一些工作才行。在 CString 中内嵌的 NULL 字符通常表现不尽如人意,应该尽量避免。
  根据 C/C++ 规则,如果你有一个 LPWSTR,那么它别无选择,只能和 LPCWSTR 参数匹配。

在 Unicode 模式下,它的构造函数是:


正如上面所表示的,在 ANSI 模式下,它有一个特殊的构造函数:


  它会调用一个内部的函数将 Unicode 字符串转换成 ANSI 字符串。(在Unicode模式下,有一个专门的构造函数,该函数有一个参数是LPCSTR类型——一个8位 ANSI 字符串 指针,该函数将它加宽为 Unicode 的字符串!)再次强调:一定要检查 BSTR 的值是否为 NULL。
  另外还有一个问题,正如上文提到的:BSTRs可以含有多个内嵌的NULL字符,但是 CString 的构造函数只能处理某个串中单个 NULL 字符。 也就是说,如果串中含有嵌入的 NUL字节,CString 将会计算出错误的串长度。你必须自己处理它。如果你看看 strcore.cpp 中的构造函数,你会发现 它们都调用了lstrlen,也就是计算字符串的长度。
  注意从 Unicode 到 ANSI 的转换使用带专门参数的 ::WideCharToMultiByte,如果你不想使用这种默认的转换方式,则必须编写自己的转化代码。
  如果你在 UNICODE 模式下编译代码,你可以简单地写成:

CString convert(BSTR b)
    if(b == NULL)
        return CString(_T(""));
    CString s(b); // in UNICODE mode
    return s;

  如果是 ANSI 模式,则需要更复杂的过程来转换。注意这个代码使用与 ::WideCharToMultiByte 相同的参数值。所以你 只能在想要改变这些参数进行转换时使用该技术。例如,指定不同的默认字符,不同的标志集等。

CString convert(BSTR b)
    CString s;
    if(b == NULL)
       return s; // empty for NULL BSTR
#ifdef UNICODE
    s = b;
    LPSTR p = s.GetBuffer(SysStringLen(b) + 1); 
    ::WideCharToMultiByte(CP_ACP,            // ANSI Code Page
                          0,                 // no flags
                          b,                 // source widechar string
                          -1,                // assume NUL-terminated
                          p,                 // target buffer
                          SysStringLen(b)+1, // target buffer length
                          NULL,              // use system default char
                          NULL);             // don''t care if default used
    return s;

  我并不担心如果 BSTR 包含没有映射到 8 位字符集的 Unicode 字符时会发生什么,因为我指定了::WideCharToMultiByte 的最后两个参数为 NULL。这就是你可能需要改变的地方。

7、VARIANT 型转化成 CString 型

  事实上,我从来没有这么做过,因为我没有用 COM/OLE/ActiveX 编写过程序。但是我在microsoft.public.vc.mfc 新闻组上看到了 Robert Quirk 的一篇帖子谈到了这种转化,我觉得把他的文章包含在我的文章里是不太好的做法,所以在这里多做一些解释和演示。如果和他的文章有相孛的地方可能是我的疏忽。
  VARIANT 类型经常用来给 COM 对象传递参数,或者接收从 COM 对象返回的值。你也能自己编写返回 VARIANT 类型的方法,函数返回什么类型 依赖可能(并且常常)方法的输入参数(比如,在自动化操作中,依赖与你调用哪个方法。IDispatch::Invoke 可能返回(通过其一个参数)一个 包含有BYTE、WORD、float、double、date、BSTR 等等 VARIANT 类型的结果,(详见 MSDN 上的 VARIANT 结构的定义)。在下面的例子中,假设 类型是一个BSTR的变体,也就是说在串中的值是通过 bsrtVal 来引用,其优点是在 ANSI 应用中,有一个构造函数会把 LPCWCHAR 引用的值转换为一个 CString(见 BSTR-to-CString 部分)。在 Unicode 模式中,将成为标准的 CString 构造函数,参见对缺省::WideCharToMultiByte 转换的告诫,以及你觉得是否可以接受(大多数情况下,你会满意的)。

vaData = m_com.YourMethodHere();
ASSERT(vaData.vt == VT_BSTR);
CString strData(vaData.bstrVal);

你还可以根据 vt 域的不同来建立更通用的转换例程。为此你可能会考虑:

CString VariantToString(VARIANT * va)
    CString s;
      { /* vt */
       case VT_BSTR:
          return CString(vaData->bstrVal);
       case VT_BSTR | VT_BYREF:
          return CString(*vaData->pbstrVal);
       case VT_I4:
          s.Format(_T("%d"), va->lVal);
          return s;
       case VT_I4 | VT_BYREF:
          s.Format(_T("%d"), *va->plVal);
       case VT_R8:
          s.Format(_T("%f"), va->dblVal);
          return s;
       ... 剩下的类型转换由读者自己完成
          ASSERT(FALSE); // unknown VARIANT type (this ASSERT is optional)
          return CString("");
      } /* vt */


  如果你想创建一个容易进行语言版本移植的应用程序,你就不能在你的源代码中直接包含本土语言字符串 (下面这些例子我用的语言都是英语,因为我的本土语是英语),比如下面这种写法就很糟:

CString s = "There is an error";


s.Format(_T("%d - %s"), code, text);


// fmt is "Error in %s file %s"
// readorwrite is "reading" or "writing"
s.Format(fmt, readorwrite, filename); 

  这是我的切身体会。在我的第一个国际化的应用程序中我犯了这个错误,尽管我懂德语,知道在德语的语法中动词放在句子的最后面,我们的德国方面的发行人还是苦苦的抱怨他们不得不提取那些不可思议的德语错误提示信息然后重新格式化以让它们能正常工作。比较好的办法(也是我现在使用的办法)是使用两个字符串,一个用 于读,一个用于写,在使用时加载合适的版本,使得它们对字符串参数是非敏感的。也就是说加载整个格式,而不是加载串“reading”,“writing”:

// fmt is "Error in reading file %s"
// "Error in writing file %s"
s.Format(fmt, filename);

  在这里,我们并不讨论 FormatMessage,其实它比 sprintf/Format 还要有优势,但是不太容易和CString 结合使用。解决这种问题的办法就是我们按照参数出现在参数表中的位置给参数取名字,这样在你输出的时候就不会把他们的位置排错了。
  接下来我们讨论我们这些独立的字符串放在什么地方。我们可以把字符串的值放入资源文件中的一个称为 STRINGTABLE 的段中。过程如下:首先使用 Visual Studio 的资源编辑器创建一个字符串,然后给每一个字符串取一个ID,一般我们给它取名字都以 IDS_开头。所以如果你有一个信息,你可以创建一个字符串资源然后取名为 IDS_READING_FILE,另外一个就取名为 IDS_WRITING_FILE。它们以下面的形式出现在你的 .rc 文件中:

IDS_READING_FILE "Reading file %s"
IDS_WRITING_FILE "Writing file %s"

注意:这些资源都以 Unicode 的格式保存,不管你是在什么环境下编译。他们在Win9x系统上也是以Unicode 的形式存在,虽然 Win9x 不能真正处理 Unicode。
// 在使用资源串表之前,程序是这样写的:

   CString fmt;
        fmt = "Reading file %s";
       fmt = "Writing file %s";
    // much later
  CString s;
  s.Format(fmt, filename); 

// 使用资源串表之后,程序这样写:

    CString fmt;
      // much later
    CString s;
    s.Format(fmt, filename);

  现在,你的代码可以移植到任何语言中去。LoadString 方法需要一个字符串资源的 ID 作为参数,然后它从 STRINGTABLE 中取出它对应的字符串,赋值给 CString 对象。 CString 对象的构造函数还有一个更加聪明的特征可以简化 STRINGTABLE 的使用。这个用法在 CString::CString 的文档中没有指出,但是在 构造函数的示例程序中使用了。(为什么这个特性没有成为正式文档的一部分,而是放在了一个例子中,我记不得了!)——【译者注:从这句话看,作者可能是CString的设计者。其实前面还有一句类似的话。说他没有对使用GetBuffer(0)获得的指针指向的地址是否可读做有效性检查 】。这个特征就是:如果你将一个字符串资源的ID强制类型转换为 LPCTSTR,将会隐含调用 LoadString。因此,下面两个构造字符串的例子具有相同的效果,而且其 ASSERT 在debug模式下不会被触发:

CString s;
ASSERT(s == t);//不会被触发,说明s和t是相同的。

  现在,你可能会想:这怎么可能工作呢?我们怎么能把 STRINGTABLE ID 转化成一个指针呢?很简单:所有的字符串 ID 都在1~65535这个范围内,也就是说,它所有的高位都是0,而我们在程序中所使用的指针是不可能小于65535的,因为程序的低 64K 内存永远也不可能存在的,如果你试图访问0x00000000到0x0000FFFF之间的内存,将会引发一个内存越界错误。所以说1~65535的值不可能是一个内存地址,所以我们可以用这些值来作为字符串资源的ID。
  我倾向于使用 MAKEINTRESOURCE 宏显式地做这种转换。我认为这样可以让代码更加易于阅读。这是个只适合在 MFC 中使用的标准宏。你要记住,大多数的方法即可以接受一个 UINT 型的参数,也可以接受一个 LPCTSTR 型的参数,这是依赖 C++ 的重载功能做到的。C++重载函数带来的 弊端就是造成所有的强制类型转化都需要显示声明。同样,你也可以给很多种结构只传递一个资源名。

CString s;
ASSERT(s == t);


9、CString 和临时对象

  这是出现在 microsoft.public.vc.mfc 新闻组中的一个小问题,我简单的提一下,这个问题是有个程序员需要往注册表中写入一个字符串,他写道:
  我试着用 RegSetValueEx() 设置一个注册表键的值,但是它的结果总是令我困惑。当我用char[]声明一个变量时它能正常工作,但是当我用 CString 的时候,总是得到一些垃圾:"&Yacute;&Yacute;&Yacute;&Yacute;...&Yacute;&Yacute;&Yacute;&Yacute;&Yacute;&Yacute;"为了确认是不是我的 CString 数据出了问题,我试着用 GetBuffer,然后强制转化成 char*,LPCSTR。GetBuffer 返回的值是正确的,但是当我把它赋值给 char* 时,它就变成垃圾了。以下是我的程序段:

char* szName = GetName().GetBuffer(20);
RegSetValueEx(hKey, "Name", 0, REG_SZ, 
             (CONST BYTE *) szName,
             strlen (szName + 1));

这个 Name 字符串的长度小于 20,所以我不认为是 GetBuffer 的参数的问题。


亲爱的 Frustrated,


CString Name = GetName();
RegSetValueEx(hKey, _T("Name"), 0, REG_SZ, 
                    (CONST BYTE *) (LPCTSTR)Name,
                    (Name.GetLength() + 1) * sizeof(TCHAR));

  为什么我写的代码能行而你写的就有问题呢?主要是因为当你调用 GetName 时返回的 CString 对象是一个临时对象。参见:《C++ Reference manual》§12.2
  大部分的编译器是这样设计的:在临时对象被创建的代码的下一个执行步骤处隐含调用这个临时对象的析构函数,实现起来,一般都是在下一个分号处。因此,这个 CString 对象在 GetBuffer 调用之后就被析构了(顺便提一句,你没有理由给 GetBuffer 函数传递一个参数,而且没有使用ReleaseBuffer 也是不对的)。所以 GetBuffer 本来返回的是指向这个临时对象中字符串的地址的指针,但是当这个临时对象被析构后,这块内存就被释放了。然后 MFC 的调试内存分配器会重新为这块内存全部填上 0xDD,显示出来刚好就是“&Yacute;”符号。在这个时候你向注册表中写数据,字符串的内容当然全被破坏了。
  我们不应该立即把这个临时对象转化成 char* 类型,应该先把它保存到一个 CString 对象中,这意味着把临时对象复制了一份,所以当临时的 CString 对象被析构了之后,这个 CString 对象中的值依然保存着。这个时候再向注册表中写数据就没有问题了。
  此外,我的代码是具有 Unicode 意识的。那个操作注册表的函数需要一个字节大小,使用lstrlen(Name+1) 得到的实际结果对于 Unicode 字符来说比 ANSI 字符要小一半,而且它也不能从这个字符串的第二个字符起开始计算,也许你的本意是 lstrlen(Name) + 1(OK,我承认,我也犯了同样的错误!)。不论如何,在 Unicode 模式下,所有的字符都是2个字节大小,我们需要处理这个问题。微软的文档令人惊讶地对此保持缄默:REG_SZ 的值究竟是以字节计算还是以字符计算呢?我们假设它指的是以字节为单位计算,你需要对你的代码做一些修改来计算这个字符串所含有的字节大小。

10、CString 的效率

  CString 的一个问题是它确实掩藏了一些低效率的东西。从另外一个方面讲,它也确实可以被实现得更加高效,你可能会说下面的代码:

CString s = SomeCString1;
s += SomeCString2;
s += SomeCString3;
s += ",";
s += SomeCString4;


char s[1024];
lstrcpy(s, SomeString1);
lstrcat(s, SomeString2);
lstrcat(s, SomeString 3);
lstrcat(s, ",");
lstrcat(s, SomeString4);

  总之,你可能会想,首先,它为 SomeCString1 分配一块内存,然后把 SomeCString1 复制到里面,然后发现它要做一个连接,则重新分配一块新的足够大的内存,大到能够放下当前的字符串加上SomeCString2,把内容复制到这块内存 ,然后把 SomeCString2 连接到后面,然后释放第一块内存,并把指针重新指向新内存。然后为每个字符串重复这个过程。把这 4 个字符串连接起来效率多低啊。事实上,在很多情况下根本就不需要复制源字符串(在 += 操作符左边的字符串)。
  在 VC++6.0 中,Release 模式下,所有的 CString 中的缓存都是按预定义量子分配的。所谓量子,即确定为 64、128、256 或者 512 字节。这意味着除非字符串非常长,连接字符串的操作实际上就是 strcat 经过优化后的版本(因为它知道本地的字符串应该在什么地方结束,所以不需要寻找字符串的结尾;只需要把内存中的数据拷贝到指定的地方即可)加上重新计算字符串的长度。所以它的执行效率和纯 C 的代码是一样的,但是它更容易写、更容易维护和更容易理解。
  如果你还是不能确定究竟发生了怎样的过程,请看看 CString 的源代码,strcore.cpp,在你 vc98的安装目录的 mfc/src 子目录中。看看 ConcatInPlace 方法,它被在所有的 += 操作符中调用。

啊哈!难道 CString 真的这么"高效"吗?比如,如果我创建

CString cat("Mew!");

  如果你编写的是运行在16K EPROMs下的嵌入式程序的话,你有理由尽量少使用空间,在这种环境下,它能使你的程序更健壮。但是在 500MHz, 256MB的机器上写 Windows 程序,如果你还是这么做,它只会比你认为的“低效”的代码运行得更糟。
  记住,在 debug 模式下,内存往往是精确分配的,这是为了更好的排错。
  假设你的应用程序通常需要连续工作好几个月。比如,我常打开 VC++,Word,PowerPoint,Frontpage,Outlook Express,Forté Agent,Internet Explorer和其它的一些程序,而且通常不关闭它们。我曾经夜以继日地连续用 PowerPoint 工作了好几天(反之,如果你不幸不得不使用像 Adobe FrameMaker 这样的程序的话,你将会体会到可靠性的重要;这个程序机会每天都要崩溃4~6次,每次都是因为用完了所有的空间并填满我所有的交换页面)。所以精确内存分配是不可取的,它会危及到系统的可靠性,并引起应用程序崩溃。
  按量子的倍数为字符串分配内存,内存分配器就可以回收用过的内存块,通常这些回收的内存块马上就可以被其它的 CString 对象重新用到,这样就可以保证碎片最少。分配器的功能加强了,应用程序用到的内存就能尽可能保持最小,这样的程序就可以运行几个星期或几个月而不出现问题。
  题外话:很多年以前,我们在 CMU 写一个交互式系统的时候,一些对内存分配器的研究显示出它往往产生很多内存碎片。Jim Mitchell,现在他在 Sun Microsystems 工作,那时侯他创造了一种内存分配器,它保留了一个内存分配状况的运行时统计表,这种技术和当时的主流分配器所用的技术都不同,且较为领先。当一个内存块需要被分割得比某一个值小的话,他并不分割它,因此可以避免产生太多小到什么事都干不了的内存碎片。事实上他在内存分配器中使用了一个浮动指针,他认为:与其让指令做长时间的存取内存操作,还不如简单的忽略那些太小的内存块而只做一些浮动指针的操作。(His observation was that the long-term saving in instructions by not having to ignore unusable small storage chunks far and away exceeded the additional cost of doing a few floating point operations on an allocation operation.)他是对的。
  如果你认为优化是你在每一行代码上做的那些努力的话,你应该想一想:在每一行代码中做的优化很少能真正起作用。你可以看我的另一篇关于优化问题的文章《Your Worst Enemy for some thought-provoking ideas》。
  记住,+= 运算符只是一种特例,如果你写成下面这样:

CString s = SomeCString1 + SomeCString2 + SomeCString3 + "," + SomeCString4;

则每一个 + 的应用会造成一个新的字符串被创建和一次复制操作。


  以上是使用 CString 的一些技巧。我每天写程序的时候都会用到这些。CString 并不是一种很难使用的类,但是 MFC 没有很明显的指出这些特征,需要你自己去探索、去发现。






