how to C

最新推荐文章于 2021-05-19 11:39:36 发布

adream307

最新推荐文章于 2021-05-19 11:39:36 发布

阅读量787

点赞数

分类专栏： Linux c/c++

Linux 同时被 2 个专栏收录

301 篇文章 4 订阅

订阅专栏

c/c++

213 篇文章 0 订阅

订阅专栏

How to C in 2016

This is a draft I wrote in early 2015 and never got around to publishing. Here's the mostly unpolished version because it wasn't doing anybody any good sitting in my drafts folder. The simplest change was updating year 2015 to 2016 at publication time.

Feel free to submit fixes/improvements/complaints as necessary. -Matt

Adrián Arroyo Calle provides a Spanish translation at ¿Cómo programar en C (en 2016)?

Japanese POSTD provides a Japanese translation at 2016年、C言語はどう書くべきか (前編) and 2016年、C言語はどう書くべきか (後編).

Chinese InfoQ provides a Chinese translation at C语言的2016.

Programmer Magazine provides a Chinese translation at 2016年，C语言该怎样写 (as PDF too).

Keith Thompson provides a nice set of corrections and alternative opinions at howto-c-response.

Rob Graham provides a response covering other avenues out of scope here at Some notes C in 2016.

Now on to the article...

The first rule of C is don't write C if you can avoid it.

If you must write in C, you should follow modern rules.

C has been around since the early 1970s. People have "learned C" at various points during its evolution, but knowledge usually get stuck after learning, so everybody has a different set of things they believe about C based on the year(s) they first started learning.

It's important to not remain stuck in your "things I learned in the 80s/90s" mindset of C development.

This page assumes you are on a modern platform conforming to modern standards and you have no excessive legacy compatibility requirements. We shouldn't be globally tied to ancient standards just because some companies refuse to upgrade 20 year old systems.

Preflight

Standard c99 (c99 means "C Standard from 1999"; c11 means "C Standard from 2011", so 11 > 99).

clang, default
- clang uses an extended version of C11 by default (GNU C11 mode), so no extra options are needed for modern features.
- If you want standard C11, you need to specify -std=c11; if you want standard C99, use -std=c99.
- clang compiles your source files faster than gcc
gcc requires you specify -std=c99 or -std=c11
- gcc builds source files slower than clang, but sometimes generates faster code. Performance comparisons and regression testings are important.
- gcc-5 defaults to GNU C11 mode (same as clang), but if you need exactly c11 or c99, you should still specify -std=c11 or -std=c99.

Optimizations

-O2, -O3
- generally you want -O2, but sometimes you want -O3. Test under both levels (and across compilers) then keep the best performing binaries.
-Os
- -Os helps if your concern is cache efficiency (which it should be)

Warnings

-Wall -Wextra -pedantic
- newer compiler versions have -Wpedantic, but they still accept the ancient -pedantic as well for wider backwards compatibility.
during testing you should add -Werror and -Wshadow on all your platforms
- it can be tricky deploying production source using -Werror because different platforms and compilers and libraries can emit different warnings. You probably don't want to kill a user's entire build just because their version of GCC on a platform you've never seen complains in new and wonderous ways.
extra fancy options include -Wstrict-overflow -fno-strict-aliasing
- Either specify -fno-strict-aliasing or be sure to only access objects as the type they have at creation. Since so much existing C code aliases across types, using -fno-strict-aliasing is a much safer bet if you don't control the entire underlying source tree.
as of now, Clang reports some valid syntax as a warning, so you should add -Wno-missing-field-initializers
- GCC fixed this unnecessary warning after GCC 4.7.0

Building

Compilation units
- The most common way of building C projects is to decompose every source file into an object file then link all the objects together at the end. This procedure works great for incremental development, but it is suboptimal for performance and optimization. Your compiler can't detect potential optimizations across file boundaries this way.
LTO — Link Time Optimization
- LTO fixes the "source analysis and optimization across compilation units problem" by annotating object files with intermediate representation so source-aware optimizations can be carried out across compilation units at link time.
- LTO can slow down the linking process noticeably, but make -j helps if your build includes multiple non-interdependent final targets (.a, .so, .dylib, testing executables, application executables, etc).
- clang LTO (guide)
- gcc LTO
- As of 2016, clang and gcc releases support LTO by just adding -flto to your command line options during object compilation and final library/program linking.
- LTO still needs some babysitting though. Sometimes, if your program has code not used directly but used by additional libraries, LTO can evict functions or code because it detects, globally when linking, some code is unused/unreachable and doesn't need to be included in the final linked result.

Arch

-march=native
- give the compiler permission to use your CPU's full feature set
- again, performance testing and regression testing is important (then comparing the results across multiple compilers and/or compiler versions) is important to make sure any enabled optimizations don't have adverse side effects.
-msse2 and -msse4.2 may be useful if you need to target not-your-build-machine features.

Writing code

Types

If you find yourself typing char or int or short or long or unsigned into new code, you're doing it wrong.

For modern programs, you should #include <stdint.h> then use standard types.

For more details, see the stdint.h specification.

The common standard types are:

int8_t, int16_t, int32_t, int64_t — signed integers
uint8_t, uint16_t, uint32_t, uint64_t — unsigned integers
float — standard 32-bit floating point
double - standard 64-bit floating point

Notice we don't have char anymore. char is actually misnamed and misused in C.

Developers routinely abuse char to mean "byte" even when they are doing unsigned byte manipulations. It's much cleaner to use uint8_t to mean single a unsigned-byte/octet-value and uint8_t * to mean sequence-of-unsigned-byte/octet-values.

Special Standard Types

In addition to standard fixed-width like uint16_t and int32_t, we also have fast and least types defined in the stdint.h specification.

Fast types are:

int_fast8_t, int_fast16_t, int_fast32_t, int_fast64_t — signed integers
uint_fast8_t, uint_fast16_t, uint_fast32_t, uint_fast64_t — unsigned integers

Fast types provide a minimum of X bits, but there is no guarantee the underlying storage size is exactly what you request. If a larger type has better support on your target platform, a fast type will automatically use the better supported larger type.

The best example here is, on some 64-bit systems, when you request uint_fast16_t you actually get a uint64_t because operating on word-sized integers will be faster than operating on half of a 32-bit integer.

The fast guidelines aren't followed on every system though. One standout is OS X, where fast types are defined exactly as their corresponding fixed width counterparts.

Fast types can be useful for self-documenting code as well. If you know your counters only need 16 bits, but you prefer your math use 64 bit integers because they are faster on your platform, that's where uint_fast16_t would help. Under 64-bit Linux platforms, uint_fast16_t gives you a fast 64-bit counter while maintaining the code-level inline documentation of "we only need 16 bits here."

One thing to be aware of for fast types: it can impact certain test cases. If you need to test for storage width edge cases, having uint_fast16_t be 16 bits on some platforms (OS X) and 64 bits on other platforms (Linux) can increase the minimum number of platforms where your tests need to pass.

Fast types do introduce the same uncertainty as int not being a standard size across platforms, but with fast types, you can limit your uncertainty to known-safe locations in your code (counters, temporary values with checked bounds, etc).

Least types are:

int_least8_t, int_least16_t, int_least32_t, int_least64_t — signed integers
uint_least8_t, uint_least16_t, uint_least32_t, uint_least64_t — unsigned integers

Least types provide you with the most compact number of bits for the type you request.

The least guidelines, in practice, mean least types are just defined to standard fixed width types, since standard fixed width types already provide the exact minimum number of bits you request.

to `int` or not to `int`

Some readers have pointed out they truly love int and you'll have to pry it from their cold dead fingers. I'd like to point out is is technically impossible to program correctly if the sizes of your types change out from under you.

Also see RATIONALE included with inttypes.h for reasons why using non-fixed-width types is unsafe. If you are truly smart enough to conceptualize int being 16 bits on some platforms and 32 bits on other platforms throughout your development while also testing all 16 bit and 32 bit edge cases for every place you use int, please feel free to use int.

For the rest of us who can't hold entire multi-level decision tree platform specification hierarchies in our heads while writing fizzbuzz, we can use fixed width types and automatically have more correct code with much less conceptual hassle and much less required testing overhead.

Or, said more concisely in the specification: "the ISO C standard integer promotion rule can produce silent changes unexpectedly."

Good luck with that.

One Exception to never-`char`

The only acceptable use of char in 2016 is if a pre-existing API requires char (e.g. strncat, printf'ing "%s", ...) or if you're initializing a read-only string (e.g. const char *hello = "hello";) because the C type of string literals ("hello") is char [].

ALSO: In C11 we have native unicode support, and the type of UTF-8 string literals is still char [] even for multibyte sequences like const char *abcgrr = u8"abc