微软Office的源代码样式规范

Office Source Code Style Guide

Dave Parker, 6/30/95

 

Abstract

This document outlines a general style guide for C and C++ source code in Office Development.  The main purpose here is to list features of C++ which we will use and which we will avoid, along with the basic rationale for doing so.  There are also standards for basic coding issues for the sake of consistency within the code and robust constructs.  This is not a complete list of C/C++ language features with commentary.  Rather, it mentions only the issues we consider important.  Knowledge of C++ is assumed.

Contents

1. General Goals.....................................................................................................................................................................

2. Classes.....................................................................................................................................................................................

2.1 Class vs. Struct......................................................................................................................................................................

2.2 Public, Private, and Protected members....................................................................................................................

2.3 Data Members.........................................................................................................................................................................

2.4 Virtual Functions................................................................................................................................................................

2.5 Constructors..........................................................................................................................................................................

2.6 Destructors.............................................................................................................................................................................

2.7 New and Delete.....................................................................................................................................................................

2.8 Operators.................................................................................................................................................................................

2.9 Inheritance..............................................................................................................................................................................

2.9.1 Inheritance of Interface vs. Implementation.................................................................................................................

2.9.2 Inheritance vs. Containment.........................................................................................................................................

2.9.3 Multiple Inheritance......................................................................................................................................................

3. Other C++ Features.........................................................................................................................................................

3.1 Constants and Enumerations........................................................................................................................................

3.2 References..............................................................................................................................................................................

3.3 Const Parameters and Functions................................................................................................................................

3.4 Default Arguments...........................................................................................................................................................

3.5 Function Overloading......................................................................................................................................................

3.6 Operator Overloading.....................................................................................................................................................

4. Common C/C++ Issues......................................................................................................................................................

4.1 #ifdefs.......................................................................................................................................................................................

4.2 Global Variables.................................................................................................................................................................

4.3 Macros and Inline Functions.........................................................................................................................................

4.4 Optimization.........................................................................................................................................................................

4.5 Warnings.................................................................................................................................................................................

4.6 Private Data and Functions..........................................................................................................................................

4.7 Typedefs..................................................................................................................................................................................

4.8 Basic Data Types..................................................................................................................................................................

4.9 Pointers...................................................................................................................................................................................

4.10 Switch Statements..........................................................................................................................................................

4.11 Asserts...................................................................................................................................................................................

4.12 Errors and Exceptions....................................................................................................................................................

5. Formatting Conventions..........................................................................................................................................

5.1 Naming Conventions..........................................................................................................................................................

5.2 Function Prototypes........................................................................................................................................................

5.3 Variable Declarations.....................................................................................................................................................

5.4 Class Declarations............................................................................................................................................................

5.5 Comments................................................................................................................................................................................

5.5.1 File Headers and Section Separators.........................................................................................................................

5.5.2 Function Headers...........................................................................................................................................................

5.5.3 In-Code Comments..........................................................................................................................................................

5.5.4 Attention Markers...........................................................................................................................................................

5.6 Misc. Formatting Conventions.....................................................................................................................................

5.7 Source File Organization...............................................................................................................................................

5.7.1 Public Interface Files.....................................................................................................................................................

5.7.2 Private Interface Files....................................................................................................................................................

5.7.3 Implementation Files......................................................................................................................................................

5.7.4 Base Filenames................................................................................................................................................................

6. Interfaces to DLLs.........................................................................................................................................................

6.1 C Functions and Global variables..............................................................................................................................

6.2 Common C/C++ Public Header Files..............................................................................................................................

6.3 Lightweight COM Objects and ISimpleUnknown..................................................................................................

7. Appendix A: Basic Hungarian Reference...........................................................................................................

7.1 Making Hungarian Names................................................................................................................................................

7.2 Standard Base Tags............................................................................................................................................................

7.3 Standard Prefixes..............................................................................................................................................................

7.4 Standard Qualifiers..........................................................................................................................................................

 


1.     General Goals

C++ is a complex language that provides many ways to do things, and going “whole hog” on all of its features can lead to confusion, inefficiency, or maintenance problems.  All Office developers need to become experts on the features we will use, and avoid the others in order to form solid conventions within the group that we are all comfortable with.  Our use of C++ features will be fairly conservative.  We’d much rather err on the side of just dealing with C, which we’re all used to, then screwing up our app with a new concept that not all of us are used to.

Underlying the choice of all of the style decisions are a few basic goals, as listed below.  When in doubt about a particular issue, always think about the spirit of these goals.  Sometimes these goals will conflict, of course, and in these cases we try to either prioritize the tradeoffs or use experience (either our own or from other groups that have used C++ extensively).

1.     Simplicity.  When in doubt, keep it simple.  Bugs are related mostly to complexity, not code.

2.     Clarity.  The code should do what it looks like it’s doing.  Other people need to be able to understand your code.

3.     Efficiency.  Speed and size are important.  Using C++ does not imply big and slow.  There are plenty of perfectly reasonable ways to make things as fast or faster than the normal C way.  Speed and size often trade off, and most people probably err on the side of choosing speed too often.  Remember that 20% of the code is responsible for 80% of the time.  In most cases, we’re more concerned about fitting comfortably in less RAM.

4.     Appropriateness.  Use the language construct that is appropriate for the abstraction or operation you are trying to do.  Do not abuse the language.  Don’t use a construct just because it happens to work.  Definitely don’t use a strange construct to amaze and confuse your friends to try to show how smart you are.

5.     Natural transition from C to C++.  We all used to be C programmers.  Others that look at our code are still C programmers (e.g. Word and Excel).  When possible, avoid C++ constructs where a C programmer’s instinct causes a wrong assumption.

6.     Catch Errors Early.  Having the compiler catch an error is ideal.  Having debug code (e.g. Asserts) catch it is the next best thing, etc.  Declare things in such as way as to give the compiler the best chance at catching errors.

7.     Fast builds.  Total generality and modularity can cause lots of inter-dependencies between files, which can have a dramatic impact on build times.  This is a constant time sink for everyone.  It is often worth rearranging things a little to make incremental builds faster.

8.     Consistency.  The whole point of having a style guide is that programmers are never totally autonomous, even when the group has strong code ownership.  Other people need to read and understand your code.  Everyone has to give a little to have a consistent style guide, but everyone gains it back when they read or debug other people’s code.

2.     Classes

C++ classes are a nice way to encapsulate code and data into a single unit, which provides a good paradigm for object-oriented implementations as well other features such as flexible access control, convenient and type-safe polymorphism, and the possibility of code reuse via inheritance.

At the most general, classes are an extension to the built-in typing of C which allows you to define your own types along with the operations on that type.  Taken to the extreme, every piece of data in a program could be an instance of a class.  However, we will not go nearly this far in Office.  We will use classes when there is a good reason to, such as the concept being implemented is inherently object-oriented or polymorphism is required.  It has been the experience of many people that programs that use classes for everything evolve into systems that are complex and inefficient.  Although this may not be the fault of any particular class, complex class hierarchies can lead to needless complexity, and overly abstracted concepts can easily lead to inefficiency.

In general, we will avoid allocating classes on the stack and passing classes by value, because this is where the use of constructors and destructors gets you into the most trouble.  Most classes should be allocated via new, freed by delete, and passed by pointer.  In addition, we will never declare a global variable which is an instance of a class that has a constructor, because this causes a bunch of C runtime stuff to get linked in and stuff to happen at boot time to construct the thing, which is a big performance hit.  Using only heap-allocated classes implies we’ll probably use classes only for relatively complex objects that you would normally have in the heap anyway, not simple things like basic data types.  Beyond this, it is a judgment call when to use a class.  Use one if there is a good reason, but not if a more straightforward solution is just as good.

Summary:

·         Use classes to encapsulate the implementation of an object-oriented concept.

·         Use classes to implement polymorphism.

·         Avoid allocating class instances on the stack and passing them by value.  Use new and delete, and pass them by pointer.  This implies not using classes for simple data types.

·         Never declare a global instance of a class that has a constructor.

·         Not everything is as class.  Use them only when you gain something.

2.1     Class vs. Struct

In C++, a struct can also have member functions and operators and everything else that a class can have.  In fact, the only difference between a class and a struct is that all members default to public access in a struct but private access in a class.  However, we will not use this as the deciding point between using a class vs. a struct.  To match the normal intuition, we will use a class if and only if there are member functions included.

Summary:

·         Use a class instead of a struct if and only if there are member functions.

2.2     Public, Private, and Protected members

As stated above, structs default to public access and classes default to private access.  However, we will depend on the default only in the case of structs (where we leave all the data implicitly public).  For a class, we will declare all members (both data and code) explicitly as public, protected, or private, and group them into sections in that order.  For example:

class Foo

   {

public:

   Foo();

   ~Foo();

   void Hey(int I);

   void Ack();

protected:

   int m_iValue;

private:

   int m_iStuff;

   void LocalHelperSub();

   };

 

Summary:

·         Declare all class members explicitly as public, protected, or private, in groups in that order.

2.3     Data Members

Data members should use the naming convention m_name where name is a normal Hungarian local variable name.  This makes member function implementations easier to read (no confusion about member vs. local data), and allows the use of the same Hungarian name for, e.g., parameters and members.  See the example below.

Data members should normally not be declared public because this usually defeats the purpose of the class abstraction.  To efficiently export a data member, declare inline get and set member functions.  This will get optimized into the same code as a public data member.  For example:

class Counter

   {

public:

   int CItems() const { return m_cItems; }

   void SetCItems(int cItems) { m_cItems = cItems; }

private:

   int m_cItems;

   };

 

Summary:

·         Data members use the naming convention m_name.

·         Do not declare public data members.  Use inline accessor functions for performance.

2.4     Virtual Functions

Virtual functions are used to allow derived classes to override a method in a base class by providing their own implementation in a way that always causes the most-derived version to be called whenever a method is called through an object pointer, even if that pointer is declared as a pointer to the base class.  This is usually done to implement polymorphism, and that’s when we’ll use them.  For example, all COM interface methods are virtual because you are always going for polymorphism via a standard interface.

Unlike simple member functions, virtual functions incur some overhead due to need to call through the vtable.  If a class contains at least one virtual function then the data size of each instantiated object will be 4 bytes larger than the combined size of the declared data in order to hold the vtable pointer.  After the first virtual function, each additional one only adds another entry to the class vtable, which is static and per-class (nothing per object), so the main concern here is whether a class has any virtual functions at all.  In addition to the memory overhead, there is the overhead to indirect a pointer twice before calling the function.  This is fairly fast and compact in 32-bit code, but affects speed and size nevertheless. Perhaps the worst part is that virtual functions cannot be inlined, so there will always be a function call, even when the work is trivial. 

Because they have overhead, you should not use virtual functions in a class unless you need to.  However, make sure you do use them when it makes sense.  In particular, if you have a base class which requires a destructor, then the destructor should definitely be virtual to allow derived classes to destruct any added members properly.  If the destructor were not virtual, then in a context where polymorphism is being used (so the object pointer is declared as a pointer to the base class), the base class destructor will always get called, even for an object of a derived class that added data members and declared its own destructor in an attempt to free them.  The derived class’s destructor will only get called if the base class destructor is declared virtual.  This scenario applies to many other kinds of methods that you will add to your classes.  In fact, most of the methods in a base class might be this way if polymorphism is intended.  This issues is discussed in more detail in the Inheritance section below.

Note that although virtual functions have a performance penalty over regular member functions, they are often the most efficient way to implement a concept such as polymorphism where the alternative would be large switch statements (not to mention the benefits of the object-oriented encapsulation).

Summary:

·         Use virtual functions to implement polymorphism.

·         Virtual functions have overhead, so don’t use them unless you really should.

·         A destructor in a base class should always be virtual if polymorphism is intended.

2.5     Constructors

Ah, constructors.  Every new C++ programmer’s nightmare.  This is one reason to try to minimize the use of constructors -- C programmers aren’t used to them and will get confused.  Another reason is the infamous performance overhead of calling a function (unless it’s inline) and doing work at possibly unexpected and/or redundant times.

However, using constructors can eliminate the dangers of uninitialized data and can also made the code simpler to read (if you’re used to it).  Judicious use of destructors (see below) which match the constructors can also help prevent memory leaks and other resource management problems.

Fortunately, the issue is mainly one when classes are declared on the stack or passed by value, both of which we will avoid.  Most of our classes should be dynamic memory objects  which will be passed around by pointer.  In this case, the constructor is essentially just a helper function for the functions that create these dynamic objects.  Using a constructor for this purpose is reasonable to ensure a clean and consistent initialization (if you make sure to initialize all data members), but to prevent potential performance problems due to redundant initialization the constructor should not do anything expensive.  Simply assigning a constant or a parameter value to each data field is about right.  Very simple constructors can be made inline. 

Most importantly, a constructor should never be able to fail, because lacking a fancy exception handling mechanism, the caller has no way to handle this in some cases.  Any initialization that can fail (e.g. memory allocations) should be put in a separate initialization member function (called, e.g., FInit).  When this is the case, it is often useful to encapsulate the creation of an object in a function (a global function or a member of another class) that calls new and then FInit for the object, and returns the result of FInit.  For example:

class Foo

   {

public:

   Foo(int cLines) { m_hwnd = NULL; m_cLines = cLines}

   virtual ~Foo();

   BOOL FInit();

   void DoSomething();

private:

   HWND m_hwnd;

   int m_cLines;

   };

 

BOOL FCreateFoo(int cLines, Foo **ppfoo)

{

   if ((*ppfoo = new Foo(cLines)) == NULL)

      return FALSE;

   if (*ppFoo->FInit())

      return TRUE;

   delete *ppFoo;

   *ppFoo = NULL;

   return FALSE;

}

 

BOOL Foo::FInit()

{

   m_hwnd = CreateWindow(...);

   return (m_hwnd != NULL);

}

 

Summary:

·         Do not do expensive work in a constructor.

·         If you do make a constructor, make sure to initialize all data members.

·         Very simple constructors can be made inline

·         A constructor should never fail.  Do memory allocations and other potential failures in an FInit method.

·         Consider making a creation function that encapsulates the new and FInit operations.

2.6     Destructors

If a class has resources that need to be freed, then the destructor is a convenient place to put this.  The normal case for us will be that this is just the central place to free resources for an object that is freed via delete (see below).  The trickier use of destructors is for stack-allocated classes, but we’re going to avoid that by not using classes on the stack. 

A destructor should be careful to destroy an object properly regardless of how it was created or used.  Furthermore, if you choose to implement a method that frees any resources before the actual destruction, make sure to reset those fields (e.g. set pointers to NULL) so that a destructor will not try to free them twice.  It is not necessary for the destructor to reset any fields, though, because the object cannot be used after it is destructed.

Like a constructor, a destructor can never fail.  Also, as stated above, a destructor in a base class should always be declared virtual to make polymorphism work.

The destructor for the above example would be defined as:

Foo:~Foo()

{

   if (m_hwnd != NULL)

      DestroyWindow(m_hwnd);

}

 

Summary:

·         Use a destructor to centralize the resource cleanup of a class which is freed via delete.

·         If resources are freed before destruction, make sure the fields are reset (e.g. set pointers to NULL) so that a destructor will not try to free them again.

·         A destructor should never fail.

·         A destructor in a base class should always be declared virtual if polymorphism might be used.

2.7     New and Delete

The operators new and delete should be used to allocate and free classes (instead of the low-level malloc-like function in your app) so that the constructor and destructor, if any are called properly.  We will implement our own global new and delete so that they in turn call our favorite low-level memory manager, so the only difference is really that new does the sizeof automatically and also calls a constructor, and delete calls the destructor.

Note that there must be some mechanism for detecting failed memory allocations. For new, the calling code is responsible for checking.  Our memory manager simply returns 0 for a failed allocation, and this will in turn be returned from new (and the constructor will not be called).  It is therefore up to the caller of new to check for a 0 return value, as in the example above in the Constructors section.

You should avoid defining any other new and delete (i.e. class-level) operators and stick to the global one to avoid mixed memory models, which complicates things like help optimization and memory leak checking and makes it risky to have the routines that allocate and free a block be different (although this is normally bad structure anyway).

Summary:

·         Use new and delete to allocate and free classes.

·         We will implement our own global new and delete in terms of the Office infrastructure memory manager.

·         Check the return value of new for failure.

·         Avoid defining any other new and delete operators (use the global ones defined by Office).

2.8     Operators

Ideally, you will never need to define an operator for a class.  If you did one, it might be operator=, but don’t define one unless you really think you want this capability.  Next might be operator== and operator!=, but the same applies here, only define them if really needed.  We’re not in the business of providing a general class library that might be used in a certain way, we’re just implementing code that we actually expect to use ourselves (as explained in a later section, we will not export a real C++ class to anyone).  And if you do define these operators, make sure they are efficient so that you are not hiding an expensive operation.

By all means, never define standard operators such as operator+ to do anything other than the standard semantics for built-in objects.  Don’t even push it by doing things like defining, say, operator+ to do a union or concatenation operation.  In addition to causing confusion, this hides potentially expensive work behind an innocent-looking operator.

Summary:

·         Ideally, classes shouldn’t need to define any operators.

·         Define “reasonable” operators such as =, ==, and != only if you really want and use this capability yourself, and if you do they should be super-efficient (ideally inline).

·         Never define an operator to do anything other than the standard semantics for built-in types.

·         Never hide expensive work behind an operator.  If it’s not super efficient then make it an explicit method call.

·         When in doubt, just make a member function to do the work so that the operation is explicit.

2.9     Inheritance

Inheritance is a powerful technique, but it is often misused and can lead to complex class hierarchies that are hard to understand and hard to change.  The following sections describe the various uses of inheritance, compare them to other techniques and try to provide rules of thumb about when to use it.

Beyond being appropriate in a particular case, however, just because inheritance can be appropriate does not mean it should be used everywhere.  A deep or wide inheritance tree gets hard to understand, hard to browse, and eventually hard to maintain.  Keep inheritance limited to a few “silver bullets” where you really win from it.

Summary:

·         Don’t use inheritance just because it will work.  Use it sparingly and judiciously.

2.9.1     Inheritance of Interface vs. Implementation

Most people think about inheritance as a way to share code.  However, one of the most useful ways to use it is simply as a way to ensure working polymorphism by inheriting interface only.  The classic example is to have an interface class which is entirely abstract (all methods are pure virtual), and then one or more implementation classes that inherit the interface and implement it in different ways.  The OLE COM model is an example of this.  A COM interface is expressed in C++ as an abstract base class, and then a separate implementation class inherits from the interface class and implements the interface methods for that object.  Here the inheritance is simply a convenient way to ensure that the object speaks the exact interface it is supposed to (has the right methods in the right order in the vtable with the right parameters and the right return types).  This is ensured by having each implementation class inherit from the same interface class, which is only declared once in a common header file.  Note than when an interface class implements an inherited pure virtual method, it must redeclare it because from a language point of view, it is still considered to “override” the base method.  For example:

// The interface base class provides interface only

class FooInterface

   {

public:

   virtual void DoThis() = 0;      // pure virtual

   virtual void DoThat(int i) = 0; // pure virtual

   };

 

// The implementation class implements the FooInterface interface

class FooImplementation: public FooInterface

   {

public:

   virtual void DoThis();

   virtual void DoThat();

   }

 

void FooImplementation::DoThis()

{

   ...

}

...

 

The above example shows the case where the entire base class is interface only.  However, inheritance of interface only can also happen at the level of an individual member function in a base class which also includes some implementation.  This is the case when any member function is declared pure virtual.  An example of this is shown below with the DrawObj::Draw method.

The above example does not use inheritance to share code.  However, inheritance can also be used for this, and this is done by providing implementations of methods in a base class that inherited classes can use.  There are two interesting cases here.  If the base class defines an implementation of a method which can either be used or overridden, then the base method is defining an interface with a default implementation.  In this case, the method should be defined as virtual so that any class which overrides the method will get the right result when polymorphism is used.  Alternately, if the base class method provides an implementation of a method which is not meant to be overridden (because it does a standard action on data which is private to the base class), then the base method is defining an interface and a required implementation.  In this case, the method should not be declared virtual.  The converse of this is that when inheriting from a class, do not override any non-virtual functions because this could lead to maintenance problems when the base class is changed.

In general, the two cases of inheritance of implementation outlined above as well as the case of inheritance of interface only can all be combined in a single class by having different methods do different things.  The key is to decide, for each method, whether the goal of the base method is to provide interface only, interface plus default implementation, or interface plus required implementation.  For example:

// A base class for drawing objects

class DrawObj

   {

public:

   virtual void Draw() = 0;          // interface only

   virtual BOOL FHitTest(POINT pt);  // default implementation

   void GetBounds(RECT *pr);         // required implementation

private:

   Rect m_rBounds;   // bounding rectangle

   };

 

BOOL DrawObj::FHitTest()

{

   return PtInRect(pt, m_rBounds);

}

 

void DrawObj::GetBounds(RECT *pr)

{

   *pr = m_rBounds;

}

 

In this example, the Draw method is pure virtual because it is only specifying an interface for polymorphic use.  Any derived class that can be instantiated must define the Draw method.  The FHitTest method is defining interface (for polymorphism) as well as a default implementation.  Any derived classes that don't need non-rectangular hit-testing can just use the default implementation (no code or declaration required), but other classes can simply override this method and do special hit-testing.  The GetBounds method is an example of a required implementation.  The base class requires that "bounds" be defined in the same way for all objects, and it doesn't make sense for anyone to change it.  In this case, the member does not need to be virtual (and should not be for clarity and efficiency) because the base class implementation is always used.

Summary:

·         Inheritance of interface can be used for ensuring a consistent (e.g. polymorphic) interface.

·         An implementation class can inherit its interface from an interface class where the interface class has only pure virtual methods.

·         When using inheritance of implementation to share code in a base class,

·  Use pure virtual functions to provide interface only.

·  Use virtual functions to provide interface and a default implementation.

·  Use non-virtual functions to provide interface and a required implementation.

2.9.2     Inheritance vs. Containment

The most common misuse of inheritance is to view inheritance as a way to share code among “similar” objects and to use it in a context where there is no real “is a” relationship.  There are several ways to share code, and the simpler technique of containment and delegation (one class contains another and delegates the relevant functionality to it), which we’re all used to from traditional structured programming, works fine in most cases.  In this case, the relationship is described as “has a”.

The primary reason to use inheritance instead of containment is to achieve polymorphism (in conjunction with the use of virtual functions).  The easiest way to test for an “is a” relationship is to think whether polymorphism is what is desired.  If so, then inheritance could be appropriate (assuming any other practical concerns are met).  Another way to test “is a” vs. “has a” is to ask yourself if it could make sense to have more than one of the base class in the derived class.  If so, then “has a” (containment) is the right model.  For example, if you were implementing a scrolling window and you already have a scrollbar class, you would notice that a window could have two scrollbars (horizontal and vertical) even if you weren’t planning on that feature in the first version, so a window should contain (“has”) a scrollbar, not inherit from (“is”) one. 

Even when you do decide to use inheritance from another class, it is often the case that you should split the original class into a base class and a derived class and inherit only from the base class.  This allows you to split off only the stuff that is really shared.  For example, say you had a Rectangle drawing object, and now you want an “Oval” object.  You convince yourself that polymorphism is desired (e.g. drawing and hit-testing code in the caller wants to treat all objects the same), and that an Oval would never want two Rectangles.  Now you might decide to have the Oval inherit from the Rectangle, but probably what you really want is to split the Rectangle class into a base DrawingObject class and a separated derived Rectangle class, and then Oval would inherit from DrawingObject, not Rectangle.  This allows later changes to the Rectangle object that are specific only to it, even if this isn’t needed now.  As in the example from the previous section, the DrawingObject base class will probably have a combination of pure virtual methods to enforce the polymorphic interface, virtual methods to provide a standard interface as well as a default implementation for all “simple objects”, and non-virtual methods to provide required implementation of stuff that is common to all objects and assumed to be constant in the common code.

Note that containment forces you to use the contained object’s public interface, whereas inheritance allows use of protected members also.  This is another way of saying that containment is more encapsulated than inheritance.  In fact, it is often said that inheritance breaks encapsulation because it can create dependencies on the implementation of the base class.  This is particularly true in the case of overridden functions, where a change to the base class might not have the right effect on all derived classes.

Summary:

·         Be careful with inheritance vs. containment.  When in doubt, use containment.

·         Inheritance is an “is a” relationship, whereas containment is a “has a” relationship.

·         Test for “is a” by seeing if polymorphism is desired or makes sense.

·         Test for “has a” by asking yourself if one class could ever use more than one of the other class.

2.9.3     Multiple Inheritance

We will avoid multiple inheritance altogether.  Multiple inheritance has a number of problems including resolution of name conflicts, efficiency concerns of some operations (functionality is hidden from you), maintenance problems, and general confusion about what the heck is going on.

If you are building a large and complex inheritance hierarchy (to be avoided as noted above), you might find yourself wanting multiple inheritance to share code from two different places. In the case of literally sharing code from two different places, this is the most dangerous form of multiple inheritance because it leads to the trickiest dependencies.  There are other forms of multiple inheritance, though.  The safest is multiple inheritance of only interfaces (no code from any base class), but even this has problems with things like name conflicts.  So, we will avoid it altogether.

Every time you think you need multiple inheritance, you should consider that maybe you are over-using inheritance and you should switch to containment in some cases.  Inheritance is a silver bullet that you have only one of.  Once you’ve used it for a given class, you need to use containment to get anything else.  Note that you can use containment as much as you want within a given class with no problems.

Summary:

·         Don’t use multiple inheritance.

·         Given only single inheritance, inheritance is a “silver bullet” which you have only one of, so use it sparingly and judiciously.

3.     Other C++ Features

The following sections comment on various new features of C++ that aren’t directly related to classes.

3.1     Constants and Enumerations

C++ adds the concept of true constants to C.  In C, you had the choice of using a #define or declaring a "const" global variable.  However, the #define will not be type safe, and the const variable takes up real memory and isn't optimized.  For example:

// C alternatives:

#define dxMin  0     // not type safe

const DX dxMin = 0; // just a real global variable

 

In C++, the const syntax declares a real constant of the specified type that the compiler will type-check, and then substitute the actual value in-line and optimize (fold constants, etc).  As a bonus, the debugger will even know about this symbol.  For example,

// C++ solution:

const DX dxMin = 0; // type safe, optimized, and debug symbol

 

So true C++ constants are preferred to the traditional C++ #define.  Note that they cannot be used in shared C/C++ header files, though, because a C compiler will just allocate memory for them.

C++ also makes the existing C concept of an enum type safe.  An enum in C++ defines a type and declares constants of that type.  You can then use that type as, say, a parameter to a function, and the compiler can then enforce that you pass one of the symbols defined in the enumeration (or you can type cast to get around this if you need to).  An enum can also be made local to a class so that its scope is limited.  For example,

class Foo

   {

public:

   enum GMODE { gmNo = 0, gmYes = 1, gmMaybe = -1 };

   void InsertGNode(GMODE gm);

   };

 

Summary:

·         Use const or enum instead of #define for constants that are only used in C++.

3.2     References

C++ adds the ability to express references to objects, and the primary use of them is to pass classes as parameters without the overhead of the copy constructor being called.  This is a worthy goal, but a more straightforward method to do this is to just to pass classes by pointer, which is what we’re all used to from C.  For someone used to C, seeing something being passed by reference looks like it’s being passed by value, so you might wonder if the constructor is being called, or whatever.  Furthermore, when using a reference, the illusion is that you have a local copy of the object that you can reference cheaply, but in fact you just have a pointer to the object (this is how the compiler does it), and every access is an indirection.  We should just make this indirection explicit by actually using pointers.  Typing “*” or “->“ instead of “.” is not a big deal, and it makes it more clear what is going on.  The one real advantage of references over pointers is that they are guaranteed to be initialized (they cannot be NULL or point to garbage).  But this advantage is not worth the above problems for us.

Also note that when you do pass objects by pointer, use const to mark formal parameters that are read-only (see the "Const" section below).  This is related to references because some C++ programmers will use the convention that read-only objects are passed by references and other objects are passed by pointer (to make this safe you still need to the declare the reference const because C++ will let you change a parameter through a reference).  This is a reasonable convention, but it still has the problem of looking foreign and confusing to programmers with a C background.  So, we will use const to get the safety but pass every object by pointer.

There are other more exotic uses of references, such as being able to return an lvalue from an operator function, and sometimes this is necessary if you've defined such an operator.  But since we don’t plan to use operators much if at all (because we won’t use stack-based classes), we should be able to avoid references in most cases.

Summary:

·         Avoid references.  Pass objects that are larger than an “int” by pointer.

3.3     Const Parameters and Functions

As mentioned above, you should use const to mark formal parameters that are read-only.  This allows the compiler to check that you actually obey this, serves as a form of documentation to users of your function, and also allows the compiler to produce better code in some cases.  For example:

/* Copy the contents of 'fooSrc' into 'fooDst'. */

void CopyFoo(const FOO *fooSrc, FOO *fooDst);

 

You can also declare non-pointer formal parameters as const (as well as the actual pointer portion of a pointer parameter, rather than what it points at, in which case the word "const" may appear twice for that parameter), but this is not as much of a win and it can make the prototype harder to read, so it's optional.  This just makes sure that you don't reuse the parameter itself as a local variable and change its value.  Of course, sometimes a function will do this as a way of avoiding declaring and setting up a local variable, so in this case you can't use const (not that this is not great programming style, but we're not going to disallow it).  On the other hand, if you don't change the value of the parameter within the function, declaring it as const may allow the compiler to generate better code.  Note that doing this does not give any useful documentation to the caller, though.  For example:

/* Copy 'cb' bytes of the contents of 'fooSrc' into 'fooDst'.

   In addition to not changing what 'fooSrc' points at, my implementation

   promises not to change the values of any of the local parameters

   within the function (like you care...). */

void CopyFooCb(const FOO *const fooSrc, FOO *const fooDst, const int cb);

 

In addition to declaring parameters const, you can also declare a member function const to indicate that the function does not modify the object.  Again, this allows compiler checks and possible optimization as well as a form of documentation.  For example:

class Counter

   {

public:

   int CItems() const { return m_cItems; }

   void SetCItems(int cItems) { m_cItems = cItems; }

private:

   int m_cItems;

   };

 

Summary:

·         Use const to mark read-only pointer parameters (what the pointer points at, not the pointer itself).

·         Use const to mark member functions that don't change the object

·         Use const to mark parameters themselves only if you care about the possible performance gains in the implementation.

3.4     Default Arguments

Having default arguments seems like a cool feature.  It seems like a way to add a parameter which only some calls to a function will need to pass, so that the simple cases will be kept simple.  Well unfortunately, there is no efficiency gain here, and instead the compiler is just hiding work from you.  If you have a function with one required argument and four optional arguments, every call to this function will push all five arguments, so you are getting the code size and time hit in every case.  Furthermore, you can’t even use default arguments just to try something new without bothering with the old calls because you still have to find all the old calls in order to rebuild those files (if you do an incremental build of just one use after adding a default argument, the other calls will screw up).  Finally, default arguments can be easily confused with overloading (which we’ll also avoid).

There are cases, however, where a certain parameter is totally irrelevant in a call (because, for example, the value of another parameter tells you all you need to know).  Note that this is somewhat different from a default argument because there is no real default value for this parameter.  In these cases, it is nice to have the untyped constant "NA" , #defined to be 0, to stand for "Not applicable" which can be passed to indicate this for any actual parameter.  This is better than passing, say, NULL for a pointer or FALSE for a Boolean because it makes it clear that the value is not important at all.  For example:

#define NA  0    // universal "not applicable" parameter value

 

/* If fShow then show the object, else hide it.  If showing then

   redraw it only if fRedraw.

void Show(BOOL fShow, BOOL fRedraw);

 

void Ack()

{

   Show(TRUE, TRUE);

   ...

   Show(FALSE, NA);

}

 

Summary:

·         Don't use default arguments.

·         Use "NA" (#defined to be 0) for "not applicable" parameters in a call.

3.5     Function Overloading

Overloading functions is just a lazy naming scheme.  It seems like a form of polymorphism, but it shouldn’t be confused with real polymorphism because all the decisions must be made staticly at compile time.  It’s just a way to keep “simple” function names and reuse them for different cases.  Such a lazy naming scheme just causes more confusion than it’s worth (trying to determine which function is relevant, changing the wrong one by accident, etc.) and can also interfere with proper use of Hungarian in some cases.  Finally, the combination of function overloading and type coercion can be quite confusing indeed.

Summary:

·         Don’t overload functions

3.6     Operator Overloading

The main use of operators is in classes.  This is discussed in a previous section.  Operators can also be overloaded at global scope.  For example, you can define what operator+ should do when it finds a Foo on the left side and a Bar on the right side.  Unlike the use within a class, this allows control over the left-hand side operand.  Anyway, all the same problems apply and more (due to the larger scope). Functionality is hidden (a possible efficiency problem) and confusion can result, so we will avoid this.

Summary:

·         Don’t overload operators, especially at global scope.

4.     Common C/C++ Issues

The following sections comment of features of regular C that we also try to maintain a consistent use of.

4.1     #ifdefs

First and foremost, everyone should try really hard to minimize the use of #ifdefs.  Programs with lots of #ifdefs are really hard to read.  It is often possible to either invent the right abstraction or to isolate the #ifdefs to small places in header files, or make the right definitions in target-specific headers so as to make #ifdefs unnecessary in the main code.  The main argument for #ifdefs (over, say, a regular “if”) is to minimize the code size.  However, everyone should be aware that the optimizer is perfectly capable of simplifying statements that evaluate to a constant.  For example,

// Wrong:

#ifdef MAC

if (x == 3 || Foo() || FSomeMacMode())

#else

if (x == 3 || Foo())

#endif

 

// Right:

// In a header file for each non-Mac platform, there is

#define FSomeMacMode()  FALSE

 

// Then the using code can just be

if (x == 3 || Foo() || FSomeMacMode())

 

In this example, the compiler is perfectly capable of eliminating what amounts to (“|| FALSE”) at compile-time.  Furthermore, if the entire "if" were to always evaluate to FALSE at compile time, then the optimizer will also remove all of the code inside the "if" and remove the test.

If you must use an #ifdef, we prefer to use #if instead because it's shorter and allows logical operations, as in:

int Foo()

{

   int x = 3

   #if MAC && !DEBUG

      x = 0;

   #endif

   return x;

}

 

Note that we will still leave flags such as DEBUG undefined when false, but the compiler does the right thing here (treats it the same as being defined to be 0).  Leaving these flags undefined means that #ifdef will also work, in case this is used by accident anywhere.

Also, as this example shows, #ifs should be properly indented so that they read easily when nested.  Yes, this works;  C Compilers have accepted this for years.

Aside from the standard identifiers defined by our build process (e.g. DEBUG, MAC), we will also use the identifiers UNUSED and LATER, which are never defined, to mark code which is currently unused but kept in for some reason, and code which cannot be activated yet but will eventually, respectively.

Summary:

·         Minimize #ifdefs by defining good abstractions, partitioning files better, or defining appropriate constants or macros in header files for the compiler to figure out.

·         Prefer #if over #ifdef.

·         Indent an #if with the code so that it reads better when nested within others.

·         Use #if UNUSED for code that is on longer used but kept in for some reason.

·         Use #if LATER for code that cannot be used yet but will eventually.

4.2     Global Variables

Global variables are usually bad programming style.  More specifically, they are often trouble when used to cache “current” state because it is so easy to get them out of sync.  Everybody knows this from their past experience, so there’s no point in going into gory detail.  In addition to these problems, globals make things such as multiple uses of a DLL in the same process, reentrancy, and multi-threading very hard to achieve, and we need to worry about all of these in Office. 

Due to the DLL/process/thread problem (all instances of the same DLL in a process as well as all threads share the same copy of the globals), most things that you might normally of as global should go in an instance data structure.  In Office, this structure is the MSOINST struct, which is allocated and returned for each caller of MsoFInitOffice.

When you do use a global (e.g. for data which is truly shared among all uses in a given process), use the Hungarian “v” prefix before the type tag to make the global clear.

Summary:

·         Minimize global variables.  Most per-client read/write storage should go in the MSOINST structure.

·         Use the Hungarian “v” prefix at the beginning of global variables.

4.3     Macros and Inline Functions

In C++, what would be a functional or procedural macro in C is usually better expressed as an inline function because this makes it type safe and avoids problems with multiple evaluation of macro parameters.  For example:

//Wrong

#define XOutset(x)  ((x) + 1)

#define XMin(x1, x2)  ((x1) < (x2) ? (x1) : (x2))

 

//Right

inline XY XOutset(XY x)  { return x + 1; }

inline XY XMin(XY x1, XY x2)  { return x1 < x2 ? x1 : x2; }

 

In addition, inline functions can be limited to the scope of a class if appropriate in order to make them more specific.

Note that #define must still be used in public header files that are exported to C clients.  Actually, some C compilers do implement inline functions (via __inline), though, so this may be possible in C also if all of your relevant compilers support it.

When you do use #define to write functional macros with arguments, be sure to remember to enclose each instance of an argument in parentheses, and also the entire expression if necessary, to avoid precedence problems.

Summary:

·         Use inline functions instead of #define macros when possible.

·         In functional macros, enclose all argument instances in parentheses as well as the entire expression if necessary.

4.4     Optimize

Always know what code the compiler generates for your favorite constructs, and what things typically get optimized.  Trying to hand-optimize your code can cause bugs and can also actually make the generated code worse.  For example, introducing temporary variables may cause something that would have been pre-computed and cached in a register to be moved into a stack variable.  It is worth it for everyone to play around with sample constructs, compile with /Fc, and look at the .cod file to see what gets generated.  Finally, remember that the expanded code you see when debugging is not the optimized code; it is totally unoptimized for debugging purposes (e.g. to make it line up exactly with the source statements).

Another issue to be aware of is that in order to get maximum benefit from our optimizer, we want to enable lots of different kinds of optimization.  Some coding constructs can confuse the optimizer and cause bugs (for example, depending on subtle aliasing behavior).  The rule of thumb here is that the more straightforward your code is, the better the optimizer will be able to improve it and the less chance of an optimization bug.  This is always good practice anyway, because it makes your code easier for people too.

Summary:

·         Know what your compiler and optimizer produce.  Compile with /Fc and look at the .cod files to see for yourself.  Don’t make theoretical arguments about what you think is generated, and don’t base it on the debug (unoptimized) version.  Check it for yourself in a ship compile.

·         Don’t use strange coding techniques (e.g. depending on aliasing) that confuse the optimizer.

4.5     Warnings

We will compile our code at warning level 3, and all code should be free of warnings.  If you absolutely have to code something that generates a warning, disable the warning with a #pragma.  We may disable some that we don’t like globally (e.g. unused parameter).

Summary:

·         Make your code warning level 3 clean.

4.6     Private Data and Functions

Data and functions that are local to a module or class should be kept as such as not exported from the module or class.  This prevents name conflicts (and therefore allows for shorter, simpler names), and also makes the linker do less work, which speeds up the builds.  In C, data and functions that are local to a module should be marked with “static”.  In C++, many “module” local data and functions can be encapsulated in classes.  In this case, class static data should be declared with static and the s_name naming convention.  Local functions should just be member functions if they logically operate on an instance of the object.

Summary:

·         Prototypes for module-local data and functions in C files are marked with “static”.

·         In C++, module-specific data can often be encapsulated in classes with class-static data and member functions.

4.7     Typedefs

In C++, a struct or class tag defines a new type automatically, so don’t explicitly typedef it.

Don’t use typedefs to define pointers to other types.  Just use the normal ‘*’ notation itself.  This keeps the Hungarian simpler and makes the pointer explicit.

We will use “int” for integer data whose exact size is not important in memory (we will assume an “int” is at least 32 bits).  However, you should use typedefs to define types for data where the size is important because it is being saved to disk or packed in a tight data structure.

Summary:

·         Don’t typedef classes or structs in C++

·         Don’t define typedefs for pointers to other types, just use the ‘*’ notation.

·         Use “int” for “natural sized” machine integers (assume at least 32 bits), but use typedefs for size-sensitive data.

4.8     Basic Data Types

Use the basic data types provided by the language, by Windows, and by OLE where appropriate.  Try not to define special basic types such as “Integer” or “Boolean” that are redundant (and possibly different than) the basic system types, unless there is a good reason to.  Win32 is our API, so we don’t need “portability” abstractions on top of this.  Defining our own types also makes it harder for client apps to integrate our header files and APIs (they have to figure our what each one is, and there is the potential for name conflicts with their definitions).

Following is a list of the most common basic types that we will use, along with the default Hungarian tag.  Note that the actual hungarian tag used will usually depend on the usage (see Appendix A).

Type

Defined by

Meaning

Default Tag

void

C++

No type

v

int

C++

Signed natural machine-size integer, at least 32 bits

(none)

BOOL

Win32

Natural machine-sized integer holding TRUE or FALSE.

f

UINT

Win32

Unsigned natural machine-size integer, at least 32 bits

(none)

BYTE

Win32

Unsigned 8-bit value

b

CHAR

Win32

Signed 8-bit value (usually used for ANSI characters)

ch

WCHAR

Win32

16-bit (wide) character used for UNICODE characters

wch

USHORT, WORD

Win32

Unsigned 16-bit value.  USHORT is preferred when the type is a number.

w

SHORT

Win32

Signed 16-bit value

w

ULONG, DWORD

Win32

Unsigned 32-bit value.  ULONG is preferred when the type is a number.

u, dw

LONG

Win32

Signed 32-bit value

l

HRESULT

OLE

OLE general purpose error/result code

hr

 

For bitfields, the type given to each field will depend on how the struct and bitfields are used.  Usually a bitfield will be used in a memory-only data structure because this eliminates concerns over bit-ordering in a file format.  If you usage is in memory and you don't care about the exact size of the bitfield, each field can simply be declared as "int".  UINT can also be used for multi-bit fields that are to be interpreted as unsigned integers.  The most common type BOOL should be declared as BOOL (which the same as int) for increased readability.  If you care about the total size of the bitfield, then you could take advantage of the Microsoft C extension and declare each field as a sized-type such as WORD.  For example:

// A typical bitfield

struct foo

   {

   BOOL fVisible :1;

   BOOL fEnabled :1;

   BOOL fPushed  :1;

   int unused :13;       // used to ensure reasonable alignment

   int iBar      :16;

   };

 

// A size-sensitive bitfield

struct bar

   {

   WORD fVisible :1;

   WORD fEnabled :1;

   WORD fPushed  :1;

   WORD unused   :13;       // to help enfore proper size

   };

 

Summary:

·         Use the basic types provided by C/C++, Windows, and OLE.  See the table above.

·         Use "BOOL", "int", or "UINT" in most bitfields, unless size is important, then use a size-specific type such as WORD for all fields.  Use "unused" fields to force alignment or to help enforce a specified size.

4.9     Pointers

We’re 32-bit flat everywhere now (finally!), so use just plain ‘*’ everywhere.  Don’t use “near”, “far”, or “huge”.

Summary:

·         Use flat 32-bit pointers everywhere (declared using just plain “*”).

4.10     Switch Statements

Some switch statements can be avoided by using classes and virtual functions in C++, and often the result will be smaller and faster as well as simpler.  A switch statement can take a long time to find its target (depending on the values involved, the compiler may not generate a good lookup table).

When you do use a switch statement, cases that do something and then fall through to the next case should be explicitly marked with a “// Fall through” comment, since this is a common source of mis-read code.

Summary:

·         Consider using classes and virtual functions instead of switch statements in some cases.

·         Switch cases that do something and then fall through to the next case should be explicitly marked with a “// Fall through” comment.

4.11     Asserts

Use plenty of asserts to verify correct program operation.  It’s better to catch errors early.  We will have a smart assert layer that allows you to break, continue, or cancel all asserts and continue. 

There are two kinds of asserts we will use.  The first, named “Assert”, evaluates its expression only in DEBUG compiles, so don’t put any needed side-effects inside!  The second, named "AssertDo", always evaluates its expression, but only checks the final condition and possibly generates an error in DEBUG builds.  Note that the “compare” portion of an AssertDo statement, if any, will be removed by the optimizer.  For example.

Assert(hwnd != NULL);

Assert(HwndTop() == hwnd); // don't want HwndTop() call in ship

AssertDo(FInitFooBar());       // must init in ship too!

AssertDo(pObj->Release() == 1);   // release and verify ref count

 

Summary:

·         Use Assert for debug-only checks, use AssertDo to execute always but check only in debug.

4.12     Errors and Exceptions

There are two basic ways to handle unexpected errors (e.g. memory allocation failure) in code.  One is to return failure codes and check them (bubbling them back to the caller as necessary), and another is to catch and throw exceptions.  Exceptions can be useful and more efficient in some cases, but they can also be harder to understand and cause unexpected results.  The root of the whole problem is that you often can’t tell what exception handling code will do by just looking at it.  There may be places where we will use exceptions (unclear at this point whether it will be C++ exceptions or something else we create), but the default should be to use normal return codes in order to make the code more straightforward.  In any case, it is not possible to throw an exception over a DLL boundary, so all DLL-exported APIs or methods that can fail must return error codes.

When doing many things in a row that can fail, it is perfectly reasonable to use a goto to direct all the return code-based functions to a common error handler (which recovers in a robust way based on consistent initialization of the data.  This is probably the best use of a goto in C/C++.  For example

{

   BOOL fRet = FALSE;

   FOO *pfoo = NULL;

   BAR *pbar = NULL;

   ACK *pack = NULL;

 

   if (!FCreateFoo(&pfoo))

      goto Error;

   //...

   if (!FCreateBar(&pbar))

      goto Error;

   //...

   if (!FCreateAck(&pack))

      goto Error;

 

   // Use data in normal way...

   fRet = TRUE;  // Success return flag

 

Error:

   /* This is common error and success cleanup code. */

   if (pack != NULL)

      DeleteAck(pack);

   if (pbar != NULL)

      DeleteBar(pbar);

   if (pfoo != NULL)

      DeleteFoo(pfoo);

   return fRet;

}

 

A convention we will use for functions that return pointers and can fail (e.g. because the object is being created and returned) is that the function will return a BOOL, and return the pointer in a parameter.  This makes explicit the fact that the function can fail and prevents errors where NULL is returned and indirected without checking by a client.  To contrast, pointer-fetching functions that cannot fail can return the pointer directly.  For example,

BOOL FCreateFoo(FOO *pfoo);       // can fail

FOO *PfooGet(BAR *pbar);       // get a FOO from a BAR, cannot fail

 

Summary:

·         Prefer error code return values to exception handling in most cases, for simplicity.

·         Always use error codes from DLL-exported APIs or methods.

·         Use goto to implement common cleanup code when using return values in complex cases.

·         Functions that return pointers but can fail should return a BOOL and return the pointer in a parameter instead of returning NULL to indicate failure.

5.     Formatting Conventions

The following sections describe various formatting conventions that we will use to produce common-looking code for easy readability throughout the group.

5.1     Naming Conventions

We will use the Microsoft Hungarian naming conventions for identifiers (see Appendix A below).  We will also keep an alphabetized text file reference to Hungarian tags that we invent ourselves and keep this checked into the project.  Whenever you add a Hungarian tag, add it to this file.  It may also be helpful to show the suggested hungarian naming at the declaration of a class or type for variables of that type, but this is not required.

When using Hungarian, remember to preserve abstraction where appropriate.  Don’t say that something is a pointer to an integer if you really want to say it’s a pointer to, say, a “cell index”.  Invent abstract tags where appropriate and use them consistently.  For real global variables, use the Hungarian ‘v’ prefix before the type tag to make the global nature clear.

In addition to the Hungarian convention, we will add a standard prefix "Mso" (Microsoft Office) to most identifers that we export from our DLLs. This provides a convenient way for the apps to tell where this function comes from and prevents name conflicts with symbols they already define. The only exceptions are for interface member function names and parameter names, because these don’t need their scope limited.  This “Mso” prefix goes before the normal Hungarian tag, and the capitalization depends on the type of identifier (for example, “Mso” for a function, “MSO” for a typedef, and “mso” for a constant). Variable and parameter names derived from types prefixed with “MSO” do not usually include “mso” in the name, however, because it is not usually necessary (except perhaps to disambiguate special cases) and clutters the names, and as a result of this uses of the hungarian tag for a type within a function name does not include “mso” in front of that type either (see examples below).

COM interface names are prefixed with “IMso”.  Variables that hold interface pointers are named with “pi” for “pointer to interface”, then a short abbreviation of the interface name (using the first letter of each word is good if this will work).  The variable does not include “mso” (see examples below).

The following examples show this prefix for various kinds of identifiers:

void *MsoPvAlloc(int cb);  // exported API has Mso

MSOTBCR tbcr;              // exported type has MSO, but variable doesn’t

#define msotbbsPushed 1 // enum/bit constant has mso and another tag

#define msoCUsersMax  99    // plain constant has mso then hungarian

IMsoToolbarSet *pits;      // interface name has IMso, variable doesn’t

pits->FUpdate();        // interface member has plain Hungarian

pit->PitsGetToolbarSet();  // member returning IMsoToolbarSet

 

One additional note about typedefs is that they should be minimized in exported headers.  When possible, use already-defined types from Windows such as BOOL and RECT (see the Basic Data Types section).  Also, do not define types for simple enumerations -- just use “int” as the type and #define constants.  The constants should have a meaningful tag in addition to the “mso” prefix, though (e.g. “msotbbsPushed”).

For C++ class data members (internally in our implementation), we will use the m_name convention because this makes it easier to distinguish members from locals and allows consistent use of Hungarian (the name portion is the normal Hungarian name and often matches parameter or local names).  A modification of  this rule is that for class static members, use a name of the form s_name in order to make the static (global) nature clear.

Summary:

·         Use Hungarian naming conventions.  Invent Hungarian tags and use them consistently where appropriate to preserve abstraction.  Add all new tags to the checked-in Hungarian tag reference file.

·         Use the ‘v’ prefix for global variables.

·         Use the “Mso” prefix before a Hungarian name when the identifer is exported to clients in a header file at global scope, as follows:

        1.   Exported API functions use “Mso” (but interface member functions don’t)

        2.   Typdefs use “MSO” (but variables and parameters of that type don’t)

        3. Constants use “mso” in addition to any enum/bit tag they may have.

        4.   COM-style interface names use “IMso”.  Interface pointer variables use “pi” (no “mso”),

              and then a short abbreviation of the interface name (such as the first letter of each word).

·         Minimize use of typedefs in exported headers.  Use basic types when possible, and “int” for enumerations.

·         Use m_name for C++ class member data, and s_name for static class data.

5.2     Function Prototypes

Always declare full function prototypes so that you get good type checking and warning level 3 clean code.  Put the names and types of the parameters inside the argument list in both this declaration and the definition, because this makes it particularly easy to keep these two in sync.  All function prototypes should also include a comment before the prototype that describes what the function does, how it uses its parameters and what it returns.  This is best done in a comment that describes this in complete sentences and referencing each parameter by name (use single quotes around the name to make it easier to read if necessary), rather than a big block comment with a line for each parameter, because these never seem to be kept up to date as well.  For example,

/* Return the index of the foo element at position (x, y) in the

   fooset at 'pfs', or 0 if no element is found.  If more than one

   is found, return the index of the most recently created one. */

int IFooFromXY(const FOOSET *pfs, int x, int y);

 

An exception to the normal prototype comment is for the declaration of overridden virtual functions in C++ classes.  The declaration of the override does not have to repeat the whole interface contract, but it should mention that it is overriding a base method (see Class Declarations below).  An additional point here is that, although the overridden virtual function will automatically be made virtual, the virtual keyword is specified again anyway for clarity.

For exported functions, the prototype will appear in a header (.h) file.  For internal functions, they can appear at the top of the implementation (.c or .cpp) file, or in an internal interface (.i) file (see below under Source File Organization).

Summary:

·         Declare full function prototypes with argument names and types.

·         Define function implementation using argument names and types directly in the argument list to match the prototype (it’s exactly the same minus the semicolon).

·         Precede each function prototype with a comment describing the function (operation, parameters, return type, side effects, etc.) in complete sentences.  Mention each parameter by name somewhere in the comment.

·         The declaration of an overridden virtual function in a derived class does not have to repeat the whole interface comment, but should mention that it is overriding a base method.

·         An overridden virtual method is explicitly declared virtual.

5.3     Variable Declarations

Variables can be declared either C-style (at the top of a function or block) or C++ style (in with the code).  The C style is preferred for variables that are used throughout the function or used for more than one thing in the function.  The C++ style is preferred for temporary variables or those limited in scope.  When using the C++ style, always provide an initialization for the variable.  For the C style, it is optional, but if you do provide one, make it something simple and fast such as just assigning to FALSE or NULL.  Don’t do anything complex or expensive because code at a C-style declaration is sometimes overlooked when reading the code.

Variables should be declared one per line, to prevent bugs with pointers, arrays, and initializers doing the wrong thing by accident.  Exceptions can be made in extremely simple cases such as int x, y;

Summary:

·         Declare and initialize variables C-style (top of block) if used throughout the function or for more then one thing, use C++ style (declare and initialize at first use) for temporary and limited-scope variables.

·         Don’t do anything expensive in a C-style variable initialization

·         Declare variables one per line, except in extremely simple cases such as int x, y;

5.4     Class Declarations

A class declaration should provide prototypes with comments for function members and comments for data members.  Function members that are overriding those in a base class which are declared and commented elsewhere in our code or in a standard place such as OLE need not have a duplicate comment, but they should be grouped into sections with a comment stating where they came from. The constructor(s) and destructor should be declared first, followed by any overridden inherited methods, followed by other methods.  The constructor(s) and destructor do not require comments.  For example,

// The Ellipse object implmements an Ellipical drawing object

class Ellipse: public DrawObj

   {

public:

   Ellipse();

   virtual ~Ellipse();

 

   // DrawObj methods

   virtual BOOL FHitTest(int x, int y);

   virtual void Draw();

 

   /* Return the x radius of the ellipse. */

   int XRadius();

 

   /* Return the y radius of the ellipse. */

   int YRadius();

 

   /* Set the shading parameters for the ellipse to use shading type

      'st' and color slopes 'xSlope' and 'ySlope' in the x and y

      directions, respectively. */

   void SetShading(ShadeType st, int xSlope, int ySlope);

 

private:

   RECT m_rBounds;   // bounding rectangle of ellipse

   SHADING sd;       // the shading parameters for 3D effects

   };

5.5     Comments

Contrary to some old Microsoft notions, comments are not bad.  Comments that are out of date are bad, but they only get that way because the programmers didn’t think maintaining them was important in the first place.  Comments are part of the code, so maintain them.  Comments can appear at several levels and serve different purposes.  Some serve to explain the contract of an interface (e.g. in a header file). Others serve to summarize the following code so that it can be read and understood faster.  And some just explain a strange construct or the reason for doing something. All of these types are important.  The following sections give examples of some different kinds of comments.

Summary:

·         Write comments both to define interface contracts and to summarize and explain implementation.

·         Keep comments up to date.  They are part of the code.

5.5.1     File Headers and Section Separators

Each file has a header at the top that specifies the name of the file, the owner, a copyright, and an explanation of the contents.  The header is done with a banner comment of asterisk characters, for example:

/***********************************************************************

   Foo.c

 

   Owner: DavePa

   Copyright (c) 1994 Microsoft Corporation

 

   The implementation of the Foo object

***********************************************************************/

 

In addition, to separate major parts of a file, banner comments of asterisk characters with an explanation of the following section are used.  These bars can also include the email name of the primary owner of that section of code.  For example:

/***********************************************************************

   The Foo object

************************************************************** DAVEPA */

 

void Foo::Hey()

{

}

 

void Foo::Ack()

{

}

...

 

 

/***********************************************************************

   The Bar object

************************************************************** DAVEPA */

 

void Bar::Glorp()

{

}

 

void Bar::Blob()

{

}

...

 

Summary:

·         Each file starts with a file header comment made with a banner of asterisk characters.  The header includes the filename, owner, Microsoft copyright, and an explanation of the contents.

·         Separate major parts of a file with banner comments of asterisk characters with an explanation of the following section.

5.5.2     Function Headers

In addition to the comment at a function prototype in a header file (see the above section “Function Prototypes”), the definition of a function also has a comment.  This can be exactly the same comment as in the interface file (this is the usually the most convenient thing), but it may differ in that it may reference some implementation-specific interactions.  This is a controversial issue because of the resulting duplicated comment that needs to be maintained, but this effort is worth it to get a clear description of the function in the implementation itself, which is what you spend most of your time reading (and the description in the header file is required when you are exporting the interface to people who don’t normally read your implementation code).

The comment at the function definition has a more pronounced banner form to help visually separate the various function implementations.  The banner includes the email name of the person who owns the function for easy reference.  We will have editor macros to generate the standard function header comment templates.

The bars of the banner comment will be a series of ‘-‘ characters.  For example,

/*-----------------------------------------------------------------

   IFooFromXY

 

   Return the index of the foo element at position (x, y) in the

   fooset at 'pfs', or 0 if no element is found.  If more than one

   is found, return the index of the most recently created one.

-------------------------------------------------------- DAVEPA -*/

int IFooFromXY(const FOOSET *pfs, int x, int y)

{

   // ...

}

 

Summary:

·         A function definition has a comment similar to (or identical to) the function prototype comment.

·         Function definition comments are marked with a distinctive banner using ‘-’ characters (see example above) which includes the name of the function, the description, and the email name of the owner.

5.5.3     In-Code Comments

Comments that appear directly in the code are usually trying to make the code easier to understand.  Summarize big operations by preceding them with a blank line and a comment on a line (or more) by itself.  Explain strange code with line-trailing comments or larger ones if necessary.  Feel free to reference RAID bug numbers in a comment if the code is directly related to a bug and not obviously correct otherwise.  Comments that take more than one line are often easier to write using /*...*/ rather than //, but // is clearly easier and better for end-of-line comments.  Single-line comments can go either way, but // is preferred.  For example:

pTarget = PObjGetTarget();

 

/* Find the beginning of the sub-list where the target object and

   its children start. */

for (p = pFirst; p != NULL; p = p->next)

   {

   if (p == pTarget)

      break;

   if (p == pSpecial)  // bug 1234: pSpecial denotes gag mode 

      {

      pTarget = pSpecial

      break;

      }

   }

 

// Process target node if found.

if (p != null)

   ...

 

Summary:

·         Use in-code comments to summarize big sections of code and also to explain weird code.

·         Reference RAID bug numbers in comments that are totally specific to that bug fix.

5.5.4     Attention Markers

It is useful to have standard markers in comments that call attention to incomplete or wring code and can be searched for.  Use “// TODO:” to mark incomplete code and “// BUG:” to mark wrong code. Follow each by an appropriate comment.  Adding your email name to the comment is an optional addition that makes it easier for others to see who added it.

Summary:

·         Use “// TODO: comment” to mark incomplete code.

·         Use “// BUG: comment” to mark wrong code.

·         Adding your email name to the comment in an attention marker is an optional addition that makes it easier for others to see who added it.

5.6     Misc. Formatting Conventions

Low-level details of code formatting such as where the spaces go aren’t that important, but a group that can standardize on these details saves time by being able to read code faster, and no time is spent reformatting copied code to match another environment.  Like Hungarian, less time spent thinking about details that don’t matter means more time to think about the things that do.

Code should be indented with tabs set at 3 space intervals (so that typing 3 spaces at the beginning of a line is equivalent to typing a tab).  This keeps the files smaller than using spaces and makes some editing operations easier.  Ideally, you would also avoid using tabs after the first non-blank character on a line (switch to spaces) to ensure alignment even when tabs change, but this is difficult to maintain so it is not required.  Make sure your editor does not change existing lines by replacing tabs with spaces or vice-versa, or deleting trailing blanks because this makes bogus SLM diffs.  Line length should be kept within 78 characters when possible so that horizontal scrolling is not required in text editors that support 80 columns plus borders.  Blank lines should be used before large code blocks (usually with a comment before the next block), and two blank lines are used between function implementations. 

Curly braces are done in the Word and Excel style (braces on their own line, flush with the indented code), and spaces are placed as in English (e.g. space after comma, no space after parentheses, space after keywords, space before and after most operators).  Labels for switch statements line up with the switch keyword.  These conventions are shown in the code below.

x = XGetFirst(p);

y = YGetFirst(p);

 

// Determine the z value for this coordinate

if (x == 0 && FBar(x, y + 1))

   z = -1

else

   {

   for (p = pFirst; p != NULL; p = p->next)

      {

      x = 3;

      if (FAck(p))

      z = 0;

      }

   x = 0;

   }

 

// Process according to the face type

switch (z)

   {

case 0:

   ft = ftNone;

   RemoveFace(p, x, y, z);

   break;

case -1:

   ft = ftReverse;

   FlipFace(p, x, y, z);

   break;

default:

   DrawFace(p, x, y ,z);

   break;

   }

 

Summary:

·         Use a blank line between blocks of code separated by a summary comment.

·         Try to keep line length to within 78 characters.

·         Indent using 3 space tabs. 

·         Make sure your editor doesn't muck with blank characters on lines that you didn't change.

·         Curly braces go on their own line, flush with the indented code (like Word and Excel).

·         Labels for switch statements line up with the switch keyword.

·         Place spaces as in English (after keywords and commas, not after open parentheses, between many operators).

5.7     Source File Organization

The following sections describe a basic methodology for organizing source code into different files based on interface and implementation.  The basic idea here is to organize the files into modules where each module has its own interface file.  This makes it easier to publish and learn the interface for a module and has other benefits such as reducing checkin merges (the owner of a module is much more likely to be the only one to change its interface file).

5.7.1     Public Interface Files

Public interface files are those exported to (and will be #included by) clients who use a module.  The public interface contains constants, typedefs, function prototypes, etc.  For Office, there may sometimes be two levels of public interface.  An external public interface is exported to users of our DLLs and must compile correctly for either C and C++ (meaning it must use all regular C declarations or macros such as the OLE COM declarations in objbase.h that expand to different things in C and C++).  Another type of public interface possible is an internal public interface.  This would be a formal interface to a module for use by other modules, but only within Office.  This interface could include C++ constructs such as class declarations.

Public interface files end in a .h extension.  This applies to both exported and inter-Office interface files.  Exported header files will be put in an inc subdirectory so that it's clear which ones are exported.  In addition, files within Office that #include public headers will use #include <file.h> for exported headers (as well as system headers), and #include "file.h" for non-exported headers.

A public interface should not declare access to data structures that it doesn’t have to.  Don’t just put every declaration in the public header file.  The whole point of an interface is to provide controlled and documented access to the functionality.  On the other hand, an overly strict and controlled interface can cause performance problems by introducing overhead.  Export what you have to in order to make the interface simple and fast, but no more.  Inline functions can often help you have your cake and eat it too in C++ interfaces when efficient access to member data is of concern.

When efficiency is not as important in an interface, another worthy goal (besides clarity) is to reduce compile dependencies.  By moving the declaration of data members into an implementation file or internal interface and only exporting a pointer type, you allow an easy incremental compile of the implementation file when the data members change.  This sometimes sacrifices a small amount of speed due to the extra level of indirection in the using modules, but often this doesn’t make any real difference.

Summary:

·         Public interface files end in a .h extension.

·         Exported public header files (#included by consuming applications) must compile properly for both C and C++, and are put in the inc subdirectory.

·         Use #include <file.h> for exported headers (as well as system headers), and #include "file.h" for non-exported headers.

·         Export what you have to in order to make the interface simple and fast, but no more.

·         Reduce compile dependencies by moving non-speed critical data declarations into implementation files and only export an opaque pointer in the public interface.

5.7.2     Private Interface Files

There are two common instances where it becomes necessary to factor out declarations that are used by more than one implementation file but not exported outside of the module to users.  One is when there is more than one .c or .cpp file involved in the implementation, which is common for complex modules or classes.  Another is when inheritance of classes is used in C++ to inherit the implementation class of one module into another.  In these cases, the common declarations should be factored out into an internal interface file. 

Internal interface files end in a .i extension.  This is done in order to easily distinguish internal and external interface files, and it also makes it easy to use the same base name in both interfaces (e.g. foo.h and foo.i).  Internal interface files always live in the normal source directory (along with the cpp files) and are included with #include "file.i".

Summary:

·         An internal interface file contains declarations that are common to multiple .c or .cpp files implementing the same module or inheriting from the same base implementation class.

·         An internal interface file ends in a .i extension, lives in the cpp source directory, and is included by #include "file.i".

5.7.3     Implementation Files

Implementation files contain the actual function bodies for global functions, local functions, and class method functions.  An implementation file has the extension .c or .cpp.  Note that an implementation file does not have to implement an entire module.  It can be split up and #include a common internal interface.  Also note that declarations that are not explicitly exported to users of the module should be kept inside the implementation file itself for increased encapsulation and fewer cases where you have to search for and/or fix and recompile callers of a module when you change something.

Summary:

·         Implementation files end in a .c or .cpp extension.

·         Split large implementations into multiple files which all include a common internal interface (.i) file.

·         Keep declarations which don’t have to be exported inside the implementation file.

5.7.4     Base Filenames

The filename extension conventions are given above.  Then there’s the question of the base part of the filename.  First of all, the name should be 8 or fewer characters and live up to all other DOS restrictions.  Even though we’re using NT for everything, some tools (e.g. SLM) have a DOS legacy and will not handle long filenames.  As far as the name itself, we will use a two-character prefix denoting the major feature team (e.g. “tb” for Toolbars, “dr for Drawing”, etc.) followed by some abbreviation of the functionality of the module.  Use the same base filename for the implementation, public interface, and internal interface (if any) when possible (e.g. tb.h, tb.i, tb.cpp).

Summary:

·         Use at most 8 characters for the base filename, and obey other DOS restrictions even on NT.

·         Use a two-character prefix denoting the major feature team (e.g. “tb” for Toolbars, “dr for Drawing”, etc.) followed by some abbreviation of the functionality of the module.

·         Use the same base filename for the implementation, public interface, and internal interface (if any) when possible (e.g. tb.h, tb.i, tb.cpp).

6.     Interfaces to DLLs

It is possible but difficult and impractical to export classes from a DLL.  But more importantly, we need to be able to deliver to C applications.  This means that we must use a standard C external public interface even for implementations that are in C++.  The following sections describe how to do this.

6.1     C Functions and Global variables

All function interfaces that go across the DLL boundary must be standard C functions.  In C++, this is done by declaring the function as extern "C", which prevents the normal “decorated” function names.  If Office, we do this with the MSOAPI macro, which is similar to the STDAPI macro on OLE’s objbase.h.  MSOAPI ensures both extern “C” and the _stdcall calling convention. DLL-exported functions that are part of a COM-style interface will be declared with the MSOMETHOD and MSOMETHODIMP macros, which are just like OLE's STDMETHOD and STDMETHODIMP macros, except that this explicitly indicates exporting from the DLL.

Global variables also need to be declared as extern "C" when exported from the DLL, but this should be rare if at all, because we don’t plan to export global data from our DLL.

Summary:

·         Declare DLL-exported functions using MSOAPI if global, and MSOMETHOD(IMP) inside COM interfaces.

·         DLL-exported data should be avoided, but must use extern "C"  if needed.

6.2     Common C/C++ Public Header Files

Since some of our clients are using C (e.g. Word and Excel), and some are using C++ (e.g. PowerPoint, Access, and Ren), our header files must compile properly for both clients, and also produce consistent names and calling conventions.  This is normally done by using a block extern "C" declaration around the whole file protected by #ifdef __cplusplus, as shown below.  Note that this is even necessary for plain C header files because when #included by a C++ client, the names will become decorated in the client

#ifdef __cplusplus

extern "C" {

#endif

 

void Foo(int x);

int Ibar(int y);

...

 

#ifdef __cplusplus

}

#endif

 

The OLE header file objbase.h also defines a number of macros that are convenient for this purpose (especially when declaring COM object interfaces).  See objbase.h for details.

Finally, note that you can’t declare anything using new C++ features in a shared header file (e.g. classes)

Summary:

·         Exported public header files need to work for both C and C++ clients.

·         Use the #ifdef __cplusplus  extern "C" construct to declare an entire header file as the C naming convention.

·         Use the macros defined in OLE’s objbase.h where appropriate to assist in shared C/C++ declarations, especially when declaring COM interfaces.

·         Don’t use C++ specific declarations in a shared header file (unless specifically protected and declared both ways via #ifdefs).

6.3     Lightweight COM Objects and ISimpleUnknown

For C++ fans it seems like a pain to have to export all functionality in standard C form.  Furthermore, standard C function APIs are not a great way to export object-oriented functionality.  And in addition, exposing lots of separate functions makes the DLL load more expensive due to the larger number of dynamic fixups required.  Fortunately,  there is a pretty good solution to all of these problems.  That solution is part of what the OLE COM (Component Object Model) specifies.

Object-oriented functionality lends itself to using object pointers and vtables inside the object.  Since most of the functions are put in the vtable, all you need global APIs for is basic creation functions to get the object in the first place.  In fact, some creations can be done from methods on other objects, so this reduces the need even for global APIs further.  The OLE COM model specifies a standard way to define objects and vtables so that the declarations and implementations can be easily mixed and matched between C and C++.  The OLE header file objbase.h defines a number of macros that make it easy to declare object interfaces (and also just functions and data) in a common header file that can be shared by C and C++.  See objbase.h for details.

The OLE COM model defines more than just C/C++ object/vtable interactions, however.  The true COM model also includes support for multiple interfaces per object and dynamic interface negotiation (QueryInterface), multiple references to objects via reference counts (AddRef and Release), and also external creation of objects (with automatic loading of the DLL if necessary) via the system registry (CoCreateInstance).  Not all of these will be of interest to parts of Office which are just trying to implement and declare a standard object-oriented piece of functionality, however.  In particular, support for creation via the registry is often not desired because it is much slower then a creation API, and reference counts are often not desired for simple objects.  If you take COM and remove the registry creation support, you have what I call “lightweight COM” or just an “IUnknown object”.  if you go further and remove reference counts, then you have what we call an “ISimpleUnknown” object.  ISimpleUnknown is a base interface that has only QueryInterface (no AddRef or Release) and no reference-counting semantics:

/**************************************************************************

   The ISimpleUnknown interface is a variant on IUnknown which supports

   QueryInterface but not reference counts.  All objects of this type

   are owned by their primary user and freed in an object-specific way.

   Objects are allowed to extend themselves by supporting other interfaces

   (or other versions of the primary interface), but these interfaces

   cannot be freed without the knowledge and cooperation of the object's

   owner.  Hey, it's just like a good old fashioned data structure except

   now you can extend the interfaces.

**************************************************************** DAVEPA **/

 

#undef  INTERFACE

#define INTERFACE  ISimpleUnknown

 

DECLARE_INTERFACE(ISimpleUnknown)

{

   /* ISimpleUnknown's QueryInterface has the same semantics as the one

      in IUnknown, except that QI(IUnknown) succeeds if and only if the

      object also supports any real IUnknown interfaces,

      QI(ISimpleUnknown) always succeeds, and there is no implicit AddRef

      when an non-IUnknown-derived interface is requested.  If an object

      supports both IUnknown-derived and ISimpleUnknown-derived

      interfaces, then it must implement a reference count, but all active

      ISimpleUnknown-derived interfaces count as a single reference

      count. */

   MSOMETHOD(QueryInterface) (THIS_ REFIID riid, void **ppvObj) PURE;

};

 

An example of some simple IUnknown objects with a C++ implementation and shared C/C++ interface files is in //daddev/filesys/usr/davepa/com/*.*.  See the README.TXT file for information.  There are plenty of examples of ISimpleUnknown objects in the Office ‘96 code (e.g. the Toolbar objects).  There are some general points about COM interfaces that should be stated, though, and these are summarized below.

First, it is more efficient and also easier to understand if you make fewer, more complete interfaces rather than using lots of small ones. There are 4 bytes of overhead per object for each interface supported, and this can get out of control quickly if lots of small interfaces are used.  For example, an OCX object has 20 interfaces and therefore 80 bytes of overhead per object just for vtable pointers!  Ideally, an object will have only one interface.  This makes the implementation much cleaner, as shown in the example implementation referenced above.  The "reuse of interface" gained by using lots of small, standard interfaces is not a big benefit unless explicit polymorphism is desired between your object and others via one of the interfaces.  There are cases when having only one interface just doesn't make sense, however, such as when polymorphism is desired based on one interface but the object still has other functionality needed by other objects.  Sometimes it is possible to handle this by implementing a set of objects connected by internal interfaces (or friend classes), but this may be less efficient and more complex to arrange.

Second, use a C++ class to implement a COM object and use inheritance of interface from an interface class to get the interface right.  When more than one interface is implemented by a single object, use nested classes instead of multiple inheritance in the implementation class.  Although the multiple inheritance declaration is shorter, you have to do special work to resolve naming conflicts between interfaces (which can be common when interfaces are versioned), it's not possible to have interface-level reference counts, which are useful for debugging and can make the release process more efficient, and there is simply general confusion about multiple inheritance (how it works and what it's doing for you).  As stated earlier, we want to avoid multiple inheritance altogether.  Also, an implementation using nested classes can be as efficient as desired, especially when an inline function is defined to compute the backpointer instead of an explicit backpointer, as shown in the example implementation.

Finally, use IUnknown-based COM object only if you want reference counts and ISimpleUnknown-based objects (see below) if you don’t want reference counts.  Be very careful with reference counts if you use them.  Reference counts are the root of much evil because they are easy to get wrong and cause bugs that are hard to track down.  Some reference count bugs are crashing bugs (serious), and some are memory leaks (hard to find).

Summary:

·         A COM object is a convenient way to export object-oriented functionality from a DLL to both C and C++ clients at the same time.

·         A "lightweight COM" or just an “IUnknown” object is a COM object that does not support CoCreateInstance.

·         Use a C++ class to implement a COM object, and use inheritance of interface from an interface class to get the interface right.

·         Use a small number of interfaces that are generally complete rather than using a bunch of small ones.  Ideally, an object would have only one interface.

·         When a COM object does support multiple interfaces, use nested classes in the implementation object to implement it rather than multiple inheritance.

·         Use ISimpleUnknown instead of IUnknown if you don’t need reference counts.  Be very careful with reference counts if you do use them.

·         Study the sample implementation of IUnknown-based COM objects in C++ in //daddev/filesys/usr/davepa/com/*.*.  There are plenty of examples of ISimpleUnknown-based objects in the Office ‘96 code (e.g. the Toolbar objects).


7.     Appendix A: Basic Hungarian Reference

This section gives a brief summary of the basic Hungarian naming elements that all implementations will use.  The Office project itself will invent abstract type tags where necessary and keep a record of these in a text file checked into the project.  The rationale and exact mechanism for Hungarian naming is not described here.  See the vintage 1988 document HGARIAN.DOC by Doug Klunder, or read chapter 2 of Charles Simonyi's even finer vintage "Meta-Programming" thesis if interested.

7.1     Making Hungarian Names

A general Hungarian name is formed by appending together one or more prefixes, a base tag, and a qualifier.  The base tag indicates the type of the variable (e.g. "co" for a color), the prefixes modify that type (e.g. "rg" for an array, giving "rgco" for an array of colors), and the qualifier describes the use of this particular type (e.g. "rgcoGray", for an array of colors used as grays).  Not all of these are used for all names.  A prefix is often not relevant, and the qualifier can be omitted if the use is obvious, such as when there is only one of the relevant type involved in the code.

It is important to note that many (perhaps most) base tags and qualifiers will be application-specific because they are denoting application-defined types and uses.  There are various standard tags for basic types (several are given below), but it is a mistake to use these when the abstract type is more appropriate.  For example, if a color happened to be implemented as a long, then one might say "rgl" for an array of colors, but this breaks the abstraction of a color.

When Hungarian is used for function names, the qualifier usually just describes the action of the function.  After the basic qualifier, it is sometimes useful to describe the parameters to the function by appending Hungarian types for them.  And, of course, the first letter of a function name is always capitalized.  For example, "FInitFooFromHwndXY" might be the name of a function which initializes a "foo" structure from parameters of type "Hwnd", "X", and "Y", and returns a Boolean success code.  This is not a requirement, though.  Use it only when it makes the name easier to understand.

7.2     Standard Base Tags

The following list gives standard base tags for basic types.  As stated above, note that an application will define many of its own tags for its own internal types.

f

A flag (BOOL).  The value should be TRUE or FALSE .  The qualifier should describe when the flag is TRUE, for example "fError" is true if an error occurred.

ch

A one-byte character (a CHAR).

sz

A zero-terminated string of CHARs (a classic C string).  Sometimes a maximum length is specified, as in "sz255", which indicates that the actual string part can have 255 characters, so that there must be 256 characters allocated.

st

A length-preceded string of CHARs.  This type of string is limited to 255 characters because the length must fit in a byte.  As with sz, the length might be specified as in "st32", which would need to have 33 CHARs allocated in order to hold the length and the specified length of real characters.

stz

A length-preceded string of CHARs that is also zero-terminated.  Note that if a length is specified, then there must be (length + 2) CHARs allocated.

chw, wt, wz, wtz

Wide-character (WCHAR) versions of ch, sz, st, and stz.  Wide strings (UNICODE) should be used for all user-visible strings.

fn

A function.  This is usually used with the "p" prefix (see below) to make "pfn" because in C you can only hold the address of a function in a variable.

b

A BYTE.

w

A 16-bit quantity (WORD, SHORT, or USHORT).

dw

A 32-bit unsigned double-word (DWORD).

l

A LONG (a signed 32-bit quantity).

u

An unsigned long (ULONG).  In classic Hungarian, this was an unsigned word.  In Office, this will be an unsigned 32-bit quantity and is therefore the same as "dw", but the "u" tag (and the ULONG type) is preferred when the type is a number.

v

A void.  This is always used with the "p" prefix to make "pv", a pointer to an unknown type.

sc

An OLE SCODE.

hr

An OLE HRESULT.

var

An OLE VARIANT.

varg

An OLE VARIANTARG.

7.3     Standard Prefixes

The following list gives standard prefixes that are used to modify a base tag type.  More than one prefix can be used.  It is possible, but probably unusual for an application to define its own prefixes (typically an application will define only base tags and qualifiers).

p

A pointer.  For example, "pch" for a pointer to a character.  In classic Microsoft Hungarian, "p" meant a near pointer, and "lp" and/or "hp" were used for long (far) pointer and huge pointer, respectively.  In a 32-bit world this is no longer an issue.

rg

An array (comes from "range").  For example, "rgch" for an array of characters.  As used in C/C++, this may be the name of an allocated array, or just a pointer to an array.

i

An index into an array of the base type.  For example, "ich" for in index into an array of characters.

c

A count of items of the base type.  For example, "cch" for a count of characters.

n

Another use for a count of items, (for "number of"), but "c" is preferred.

d

A difference or delta between values of the base type.  For example, "dx" for a difference between two values of type x.

h

A handle.  An opaque reference to an item of the base type that cannot be indirected by the user (this definition has been loosened in the past due to a somewhat different use for a movable memory block).  For example, "hwnd" is a handle to a window ("wnd") that you are not allowed to indirect and access the fields of because it's not in your address space (this also preserves the abstraction of the opaque reference).  An application module might export a handle to an abstract data type (typically declared as a void *) so that clients can only use the reference and never see or know about the fields.  However, C++ makes his less necessary by due to the ability to export a pointer to a class with private data members.

pl

A “plex” of objects.  This is an alternative to a simple array (“rg”) that is resizable in a standard way using the plex abstraction (see inc/msoalloc.h).

mp

An array used to map an index or other scalar to a value.  The index and value tags are appended, as in "mpchdx" to map a character value (used as the array index) to a dx value for that character.

v

A global variable.  When used, this is always the first prefix.

 

In addition, there are the following special prefixes that we will add before any Hungarian prefix when appropriate:

m_

A C++ class data member.

s_

A C++ class static data member.

Mso

An exported global function.

MSO

An exported typedef

mso

An exported global constant or global variable.

7.4     Standard Qualifiers

Most qualifiers are defined by the situation where the name is used.  However, there are a few standard ones, as given below.

First

The first item in a set, or the first one of interest (e.g. pchFirst)

Last

The last item in a set, or the last one of interest (e.g. pchLast).  When used as an index, Last represents the last valid/desired value, so a loop might read:
for (ich = ichFirst; ich <= ichLast; ich++)

Lim

The upper limit of elements in a set.  Unlike Last, Lim does not represent a valid value; Lim is one beyond the last valid value, so a loop might read:
for (ich = ichFirst; ich < ichLim; ich++)

Min

The minimum element in a set.  Similar to First, but usually represents the first valid value, not just the first one to be dealt with.

Max

The upper limit of elements in a set (same as Lim).  Unfortunately, a normal English reading of "Max" usually implies the last valid value, but the Max qualifier is not a valid value; it is one beyond.  Like Lim, a typical use would be:

for (ich = ichMin; ich < ichMax; ich++)
Be very careful with this one.

Mac

Like Max, but sometimes used when the "current" maximum can change over time.  Note that Mac is also one beyond the "last" valid value.

Mic

Like Min, but sometimes used when the "current" minimum can change over time.

T

A temporary value.  This qualifier is probably overused as a way to avoid coming up with a good new name for something, but sometimes a brief temporary is OK, such as in a classic swap operation.

TT, T3, etc.

Further abuses of the T = temporary convention when more unique names are needed.  These should be avoided.

Sav

A temporary value used to save a value so that it can be restored later.  For example:
hwndSav = hwnd; ...; hwnd = hwndSav;

Null

The special 0 value, always equal to 0 but used for documentation purposes (e.g. hwndNull).

Nil

A special invalid value, not necessarily equal to 0 (might be -1, or anything else).  To avoid confusion, it's best to not have both Null and Nil values for the same type.

Src

The source of an operation, typically paired with Dest as the destination, as in:

*pchDest = *pchSrc

Dest

A destination.  See Src.

 

### 回答1: c解析office文件的开源代码有很多选择,其中比较常见的是libreoffice和Apache POI。libreoffice是一套功能强大的开源办公套件,它支持多种办公文件格式的解析和编辑,包括Microsoft Office的文件格式。libreoffice内部使用的文件格式解析代码是开源的,可以通过查看源码来理解其实现原理。 另一个常用的开源代码是Apache POI。Apache POI是一个用于操作Microsoft Office文件的开源Java库,它能够读取和写入各种Microsoft Office文件格式,如doc、docx、xls、xlsx等。POI提供了丰富的API供开发者使用,通过查看其代码可以了解如何解析和操作这些文件格式。 这些开源代码提供了对office文件格式的详细解析和操作方法,可以帮助开发者更好地理解和处理office文件的内容。通过研究这些代码,我们可以了解文件格式的结构和特性,并能够编写相关代码来实现自己的需求,如读取和修改办公文件的内容、样式、格式等。 总之,通过查看libreoffice和Apache POI等开源项目的代码,可以深入了解office文件的解析和操作原理,为自己开发应用程序提供指导和帮助。 ### 回答2: C 解析 Office 文件开源代码是指通过使用开源代码,对 Microsoft Office 文件进行解析和提取其中的信息。通常情况下,Office 文件包括 Word 文档(.docx)、Excel 表格(.xlsx)和 PowerPoint 演示文稿(.pptx)等格式的文件。 目前,一些优秀的开源项目提供了解析 Office 文件的功能,并提供了相应的代码库供开发者使用。其中,最常用的几个开源项目包括 Apache POI、python-pptx、openpyxl 等。 Apache POI 是一个 Java 库,可用于读取和写入 Microsoft Office 文件。它支持解析 Word、Excel 和 PowerPoint 文件,并提供了一系列的 API 接口,便于开发者操作这些文件。通过 Apache POI,开发者可以读取 Office 文件中的文本、表格、图片及其他对象,并进行相应的操作和处理。 python-pptx 是一个用于处理 PowerPoint 文件的 Python 库。它提供了许多功能,包括读取和写入 PowerPoint 文档、创建和编辑幻灯片、添加文本和图片等。使用 python-pptx,开发者可以轻松地解析 PowerPoint 文件中的内容,并进行一系列的操作。 openpyxl 是一个处理 Excel 文件的 Python 库。使用 openpyxl,开发者可以读取和写入 Excel 表格,包括对表格的编辑、格式化、操作及数据提取等。这个开源项目提供了简单易用的 API 接口,使得通过代码解析和处理 Excel 文件变得更加便捷。 通过使用这些开源库,开发者可以灵活地对 Office 文件进行解析和提取,以满足各种需求,如数据分析、文档处理和自动化操作等。这些开源项目在社区中广泛应用,并得到了不断的更新和改进。 ### 回答3: 解析Office文件是指对Microsoft Office软件中的文件进行分析和提取信息的过程。开源代码是指可以公开查看、使用和修改的软件源代码。 要解析Office文件,可以使用一些开源的代码库和工具,例如Apache POI、LibreOffice、OpenXML SDK等。其中,Apache POI是一个流行的Java库,用于操作Microsoft Office文件。它可以读取、写入和修改Word、Excel和PowerPoint文件的内容和属性。通过POI,我们可以提取文件中的文本、表格、图表、样式等信息,并进行相应的处理和分析。 另外,LibreOffice也是一个强大的开源办公套件,可以处理各种Office文件格式。它提供了Python、Java和C++等不同语言的API,使得解析和操作Office文件变得更加灵活和方便。通过LibreOffice,我们可以提取和转换Office文件的内容,例如将Word文件转换为PDF或HTML格式,或者提取Excel文件中的数据进行统计分析。 OpenXML SDK是微软提供的一个.NET开源库,用于读取和写入Office Open XML(docx、xlsx和pptx)文件。它提供了一组强大的API,可以解析和操作Office文件的内容和结构。我们可以通过OpenXML SDK读取并分析文件的各种属性、段落、样式以及嵌入的对象等信息。 综上所述,解析Office文件的开源代码有很多选择,包括Apache POI、LibreOffice和OpenXML SDK等。使用这些代码库,我们可以方便地读取、分析和处理Office文件中的各种内容和属性。无论是从文本提取数据,还是对表格进行统计分析,都可以借助这些开源代码来实现。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值