Using Header Files To Enhance Portability

Using Header Files To Enhance Portability

Rainer Gerhards

--------------------------------------------------------------------------------


Rainer Gerhards specializes in systems programming and has a strong interest in C. He has written some large-scale control systems and many small utilities in C. He owns his own small software company in addition to managing the computing center of a mid-sized company. He may be contacted at Petronellastrasse 6, 5112 Baesweiler, West Germany.

C is known for its efficient code, rich set of features and portability. While portability is not built in, you can avoid possible portability problems by anticipating them. Let's look at a few problem areas, suggest some solutions, and examine one method in detail.

One important portability issue is the C dialect that your compiler implements. Although there have always been C language standards, until recently they have been too imprecise to preclude varying interpretations. Early, less powerful machines also forced compiler writers to limit features, contributing additional variant dialects. Thus, some compilers can't understand valid C-coding if it contains unsupported features.

Bit fields are a good example. A number of modern compilers still don't support bit fields. Of course, you could avoid using bit fields, but what if you write for one compiler which doesn't support structure and union assignment and for several others which do? You might avoid these constructs too, but would you prefer to learn while porting a 50,000 line program which makes extensive use of structure assignment, that the environment to which you're porting doesn't support structure assignment? The challenge is to know which features to avoid.

Now nearly all commercially-used compilers support C in its entirety. But these compilers offer extra features, especially in the preprocessor area. Though you may simply avoid these features, you may not know which features are non-standard, especially if you are new to C or if you work in just one environment. Some compiler vendors don't flag such features.

Even an experienced C programmer determined to avoid the problems outlined above by using only standardized constructs still faces the difficulty of deciding which "standard" to use: the original Kernighan and Ritchie (K&R) standard defined in The C Programming Language, or the forthcoming ANSI standard.

The ANSI standard resolves many portability problems not addressed by K&R and provides a good base for the future. The ANSI standard is mostly upwardly compatible with K&R; most K&R programs can be moved to ANSI compilers without any problems. But in order to move code in the opposite direction successfully (from ANSI to K&R), compilers require special preprocessor tricks I'll describe later.

The standard library poses similar problems. Compiler writers have restricted and extended the library rather than the language. Some compilers don't even have a standard library; many libraries include numerous extensions. MS-DOS compilers in particular tend to offer extensions covering graphics, interrupts, and operating system interfaces. Porting code which uses one compiler's extensions to a different compiler can be very difficult.

Operating system differences, because they are the hardest to hide, are among the hardest subjects to address. Moreover, operating systems differ greatly — some do multi-tasking, some are multi-user, and some are single tasking systems. The file-naming conventions are anything but standardized. These problems are minor compared to the variations in file organization. For example, while most operating systems consider text files to have variable length records (if any), some use fixed-length records (if any). Records may be delimited by /n, /n/r or record-length fields. Some OSs use special blocking mechanisms, others don't.

Fortunately most standard libraries can hide these differences, but only by distinguishing between text and binary mode, introducing subtle, non-standard features.

In addition to processing files the operating system should have some kind of interaction with the user, which leads to additional problems if you use special system features like asynchronous communication or sophisticated display manipulation.

Hardware differences can cause programs that compile and link without error and run well in one environment, to crash in another. Often these problems are caused by different word lengths. It's hard for a UNIX programmer working with the portable C compiler (PCC) on 68xxx to learn that the same PCC on 80x86-based machines uses 16 instead of 32 bits for integers. A 68xxx program that uses integers to index some two million database records on a 68xxx machine may require a major rewrite before it can access more than 32,767 records on the 80x86 machine.

Hardware differences can also affect the portability of pointer casts. Many programmers assume that pointers can simply be cast from one type to another — a reasonable assumption on most byte machines. However, word machines' (like the Unisys 1100) pointers to word-aligned items differ significantly from pointers to non-aligned items. This is true for some so-called byte machines too. Still other problems arise when you port code from machines with a segmented address space to one with a linear address space.

The last problem is machine resources. Many programmers assume that if their code is portable and standardized, their program will run on all machines supporting a standard C- compiler. While this is basically true, some programs require so much memory or processing time that they simply can't be run on some smaller machines.


Designing For Portability

In spite of these problems, it is possible to write C programs that can be compiled and executed in different environments. To be portable, a program must be designed and coded in a fashion that hides environmental differences.
C's own design hides many environmental differences. The standard library is a successful attempt to hide some very environment-specific information — such as the way in which file system (and some others) calls are done on the target operating system. Without the standard library, every programmer would have to write the interface coding himself. Even worse, he would have to rewrite it again and again for each new environment.

You can hide other large environment differences by creating your own "standard libraries" for other tasks: extract the non-portable operations to a separate source module, define a general interface for this model and build a different implementation for every environment you want to work with. Many of the high quality portable support library products available do this for you. Such a library provides "instant" portability, lower cost, and more functionality than an equivalent product written by a single programmer.

While system-specific libraries are appropriate for horrible, non-portable tasks like dealing with the user console, using a standardized function call for smaller tasks which require only slightly different coding in limited areas of the source code might not make sense. In this case it would not make sense to define a one-line function to set a signal handler under one environment only, especially if the signal-handler is called from inside a tight loop where the calling overhead could cause performance problems.

The C preprocessor is the obvious tool for these smaller coding differences: just use conditional compilation to enable the code which sets the signal handler in the one environment where it's needed. You don't have to define a large number of functions, and there is no unnecessary calling overhead.

The preprocessor can also help solve problems that arise simply because different names are used for the same thing. For example, nearly every compiler uses its own name for the machine-level i/o (port) functions of MS-DOS compilers (for example inp and outp versus inportb and outportb). Fortunately these functions have the same calling conventions. In this situation, rather than use conditional compilation for every function call parameterized, just use conditional compilation one time to define a macro that in turn calls the function with the right name. Everywhere else, the code uses the macro to call the function.

Macro and constant definitions can also completely hide slight differences in standard library paramenters. For example, when working under two different operating systems where the standard libraries have different open modes for text and binary files, you could use the call to open a binary file for writing


fp = fopen ("file", OPM_WM)
Under UNIX, OPM_WB would be defined "w" and the call would expand to

fp = fopen("file", "w")
Under MS-DOS (Microsoft C) OPM_WB would be defined "wb" and would expand to

fp = fopen("file", "wb")
Sometimes a simple define can also hide significant hardware differences. Different data type sizes can be hidden by defining your own data types with a guaranteed minimum and maximum precision. For example, type int32 (integer containing at least 32 bits) would be mapped to int for 68xxx machines and to long for 80x86 machines. If int32 has been used in every spot requiring a 32-bit integer, nothing but the definition needs to be changed to adjust for the alternate name. (Please note that a data type redefinition can be done either with the preprocessor or a compiler typedef. While the former is potentially more portable, so far I have not seen a compiler which does not implement typedef. Thus I prefer using typedef because sophisticated compilers can do better error checking with it. However, if you want to be absolutely sure that your data type redefinition will be accepted by all old compilers, you must use preprocessor defines.)
By now it is obvious that the preprocessor can help make programs more portable. What would make more sense than to combine all these preprocessor-based aids? This can be done in a single header file. For nearly two years I have been using such a file, working mainly with four different MS-DOS compilers and the UNIX PCC. The idea developed because of minor standard-library differences between MS-DOS compilers, but it soon became clear that the header file could help when porting to UNIX, too. The still incomplete result will be described below.


environ.h

All necessary preprocessor statements and typedefs are included in one single file named environ.h (Listing 1) . It should be the very first file included. Before including environ.h, you should define which other standard include files you need. This is done by defining some preprocessor constants which correspond to standard include file functionality. You read right, functionality — not names. For example, if you select the define INCL_ASSERT, not only will the file assert.h be included but the necessary (for MS-DOS/MSC) file process.h also. If you compile under UNIX, only assert.h is included. Defining these constants in terms of functionality hides the include file name differences — an important feature that saves you many conditional directives in the source modules. Microsoft uses a similar system for their OS/2 header files in MSC 5.1.
When completely defined for your environment, environ.h should #include all include files needed by your application. If you find it necessary to explicitly include other files, you should extend the definitions in environ.h. They are still incomplete (see lines 274 - 401).

environ.h begins by preventing the accidental inclusion of a header file more than once. Multiple inclusion may cause damage to some preprocessor defines. At best, it will cause additional overhead, and at worst, program errors may occur. To prevent these problems environ.h checks preprocessor constant ENVIRON_H. If this constant is defined, environ.h assumes that it has been previously included and takes no further steps (via the #ifndef ENVIRON_H in line 26). If ENVIRON_H is not defined, then this is the first inclusion of environ.h and processing takes place. First ENVIRON_H is defined, ensuring that no second inclusion will be possible.

Next, based on which compiler and operating system are active, ENVIRON_H defines the target environment. Information about the environment is acquired in a relatively straightforward way (lines 29 - 165). Operating-system specific constants that may be defined automatically by the compiler are purged — they will be replaced with your own. The #undef of the default definitions is not actually necessary, but it will prevent possible warning messages from appearing when redefining the compiler default constants.

The #undefs are followed by defines which select the target OS. Only one may be active at one time. Note the definition to 0 or 1. You could also define only one OS constant and use #ifdef instead of


#if CONSTANT == 1
but this has the disadvantage that K&R compilers have no "#if defined(CONSTANT)". Without this command it is hard to build complex preprocessor-ifs using #ifdef and #ifndef because you can't use Boolean operators. If you define the constants to 0 and 1, you can build normal conditional expressions. This is an advantage if you consider that you must often ask questions like

#if MSDOS && USE_BIOS
Following the OS definition there are some auxiliary definitions used only under specific OS to identify the target machine. Currently these apply only to certain generic MS-DOS machines within compatible hardware or BIOS requiring actual MS-DOS calls (as opposed to BIOS calls or direct hardware manipulation). The only common example is the early Wang PCs, for which there is a separate definition.
The operating system definitions are followed by the compiler definitions. A specific compiler selection is only necessary if more than one is available under one OS. In my case this is only needed for MS-DOS. But as you can see in environ.h there is only a definition for MSC. All other compilers I use identify themselves by doing an automatic constant definition upon startup (e.g., ___TURBOC___ for Borland's Turbo C). Note that the MSC constant is overridden if one of the other predefined constants is detected or an OS other than MS-DOS is active (lines 88 - 106). This feature simplifies proper configuring of the header-file.

Separate constants for each compiler to allow conditional compilation for small compiler differences. To avoid code like "#if MSC || DLC || LC || ___TURBOC___ || .... "we introduce some language set selection constants (lines 70 - 76). Each define corresponds to one language feature. If the constant is equated to true (1) that language feature can be used, otherwise it cannot. All other decisions are based on these feature selection constants and are much more readable. Now the example given above takes the more intelligible form


#if USE_VOID.
To avoid modifying all language selection constants each time you change compilers, environ.h includes an automatic language set selection which automatically redefines the language set constants based on the compilers' and OS definitions. While auto selection is currently only functional in the MS-DOS environment, it can easily be expanded to work under different operating systems (lines 129 - 164).
To complete the environment definition, environ.h defines the constant ANSI_C to 0 or 1 in respect to the compilers' C standard (K&R/ANSI) (lines 119 - 127). This constant is currently set based on the state of a language feature selection (like USE_VOID), but could become more important in the future.

The example header file still lacks one feature, a definition check. All definitions are accepted as entered. If, for example, the programmer defines two or more operating systems to 1 the behavior of environ.h is undefined but clearly erroneous. This could be avoided by checking the entered definitions to see if two or more definitions are true and aborting compilation if so:


#if MSDOS && UNIX
"Error: Both MSDOS and UNIX
selected"
#endif
This code ask for the error condition and generates a compile-time error if it detects one. The error message generated by the compiler points at the real error message in the source module. Examples can be found in CUG library volume 227 (compatible graphics) in file graphics.h. This file contains extensive definition checking.
So far environ.h has supplied definitions that allow conditional compilation in the source units but no automatic porting aids. The balance of the file addresses this second need. Different compiler data types and modifiers can be hidden largely by preprocessor defines. For example, if the compiler doesn't support the void keyword, just define void to nothing, and the void keyword will disappear. Since you didn't use void originally when writing for that compiler, this disappearance will cause no problems. Your coding can now be used with compilers that support void without any additional work.

That is the key feature of modifier definition: you can hide all data type and modifier differences by simply defining the data type in question to nothing (as in lines 167 - 195 in environ.h).

Here's another example: if a compiler doesn't support the volatile modifier, it normally doesn't do the strange optimizations that force you to use volatile (or they can be turned off), so there is no problem in purging all volatile modifiers in your source.

This kind of type redefinition allows you to use the types on machines supporting them without losing backward compatibility. If an older compiler doesn't support these type modifiers, their extra value is gone but your program still runs without problems.

Most data types and modifiers can be treated in this manner. (In some cases you may instead redefine the type to something different — e.g. define void to int instead of purging it). However, some types and modifiers, like enum, can't simply be redefined to nothing or to some other value. If you try to redefine these types, your program won't compile due to the syntax differences between defining a "normal" data item and an enum one. Defining an enum is a process nearly identical to defining a structure or union. Special definitions are required. You can't hide them by one general define.

You still can use enum on supporting and non-supporting compilers, but you must define all your enum types using conditional compilation. If the compiler supports enum, you can use it without difficulty. If not, you define an int type and use the preprocessor to define the enum tags:


"#if USE_ENUM
typedef enum { A, B } enumtype;
#else typedef int enumtype;
#define A 0
#define B 1
#endif"
This clearly entails more programming work but allows the use of extended error checking features of compilers that support enum.
You can define your own data types to hide hardware differences, especially machine word length differences. They ("personal types") have a guaranteed minimum and maximum precision and are mapped to the actual hardware data type. By relying on these "personal types," you can write programs that work on different machines in an expected manner, and you can take memory requirements into account because there is a guaranteed MAXIMUM precision.

This problem wasn't critical to me, so the example header file contains only very limited support (lines 258 - 261). Please note that typedefs are used instead of preprocessor defines.

The next problem area is that of standard library function names and calling conventions. For example, calling exit() in C will commonly terminate your program gracefully. Under the Starsys OS, exit() is an OS call something like abort(). The real exit() function has been called dx_exit(). This causes problems to all but a few programs and would normally require text modifications. But that's exactly what the preprocessor can do for you: if you're running under Starsys, just define a macro named exit which takes one parameter (the return value). It will expand to a call to dx_exit() with that given parameter (line 234 - 236).

A similar technique hides the variations among library functions with different names but identical calling parameters and functionality. Example macro definitions can be found a few lines above the exit() macro.

File open modes are addressed in lines 241 - 253. Please note that not all open modes are supported, but the definitions can be easily expanded.


Function Prototyping

Unfortunately, ANSI function prototyping is not supported in every environment. Rather than sacrificing the extended error checking features that prototyping offers by not using it at all, you can use prototyping when the compiler supports it and turn it off when it does not.
Turning off function prototyping is a little harder than turning off an unknown modifier. First you must build two classes of function prototypes, external and internal, corresponding to external and static functions. The external prototype macros appear in lines 197 - 211. This macro expands to extern func() for a K&R compiler and to extern func(int) for ANSI compilers.

Please note the extra parentheses around int in the PROTT definition. These parentheses become part of the macro argument and are re-expanded. After expansion, they are the function parentheses of extern func(int). These parentheses are especially important if you want to prototype a function with more than one argument. If there were no inner braces, the macro would have two arguments, which would force you to write one prototyping macro for every number of function arguments you will ever use. Given these inner braces the whole prototype is one macro argument and only one prototyping macro will satisfy all needs.

Normally you write a function header only once for each internal function. It is more difficult to hide these prototypes: modern ANSI's style is to write argument types and names in the function header (e.g. static func(int a)), while K&R's style is to write the argument names only (static func (a)). Fortunately ANSI compilers accept function headers written in K&R style, but usually don't build prototypes for such headers. One solution is to write the prototype first and then to write the actual function header (STATICPT(func, (int));/n static func()). In this case the function prototype defines the function first as extern to prototype it (just as is done in application header files). While this has worked well with all ANSI-compilers I know of, I'm not certain that it is guaranteed to be legal under ANSI-standard.

At first glance you may wonder why the prototype does not have the form static func PROTT((int)) and in fact I am not sure if these constructs are legal. Most compilers accept the functions to be declared to extern and later redefined to static. However, the MSC compiler doesn't accept this construct and generates error messages (at least QC does; CL accepts them with warnings). Instead, MSC allows both the function prototype and the actual function header to be declared static — the approach used in environ.h. If MSC is active, the prototype attribute is redefined to static. To do this the macro must have control over the whole prototype line, not just part of it. So a new construct has been created. The macro has two parameters: the function name and the prototype. It expands to the correct modifier followed by the function name and (if selected) the function prototype.

This may be a somewhat unusual macro construct, but remember that the C preprocessor is mainly a text substitution tool and not part of the actual compilation process. This allows the preprocessor to make some very strange modifications to the C source code, including constructs like the static function prototyping which cannot be done by any C statement. Building such unusual constructs can give very simple solutions to otherwise intractable problems. The STATICPT() macro can be found between lines 197 and 211.


Conclusions

As you can see, the environmental header file environ.h can aid in writing portable programs, especially in the problem areas of data type, modifier and name differences. In addition, some machine specifics can be hidden and some newer constructs mapped to work with older compilers.
On the other hand, the header file can't hide some differences (e.g. different mechanisms for interacting with the user console). Such differences require special coding that normally should be contained in external modules. But the header file can help you write these modules too by precisely defining the target environment. Precise functional definitions are the basis for selecting the right code sequences in the low-level driver modules (assuming that coding for more than one environment can be contained in one source unit). The definitions will aid you in activating slightly different source lines which you may have in your program.

Thus, a larger porting system is built using three modules. First, the environment header file describes the environment and hides all differences possible using the preprocessor and typedefs (mainly text substitutions). Second, libraries of standardized functions handle larger problem areas that actually require different coding. Third, conditional compilation within the source modules hides very small differences where the text-substitution capabilities of the preprocessor are insufficient and a special function call makes no sense.

This last option should be limited to cases where it is absolutely necessary, because conditional compilation is not really portable programming, but is rather having code for all known environments. If you switch to a new environment, you must not only write new coding but also look for a problem area in the source file. To avoid these problems I recommend flagging these lines with special comments (e.g./*PORT*/).

Related code can be found in the CUG library holdings. Volume CUG227 contains a compatible graphics system which makes extensive use of the preprocessor's text substitution capabilities. Volume CUG265, the cpio starter kit, contains a header file similar to the one discussed here. It also contains programs using it.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值