I recently stumbled upon a method to reduce compilation times in C/C++ projects by so called Unity Builds (UB). The technique behind UB is quite trivial: compilation time is reduced by reducing the amount of disk access during compilation.
During compilation included files are walked multiple times. This is done for each translation unit the compiler encounters. Unless you own a solid state disk, your hard drive uses moving parts to read, write and seek positions that will slow down compilation.
In the demo project I’ve created, see links at the bottom of the post, there are actually two dummy translation units, t1.cpp and t2.cpp, that both include the same file ub/b.h which in turn includes ub/a.h. When compiled and instrumented, in MSVC you can use the switch/showIncludes to list the files parsed during compilation, you’ll see an output similar to
1
2
3
4
5
6
7
|
Compiling...
t2.cpp
Note: including file: b.h
Note: including file: a.h
t1.cpp
Note: including file: b.h
Note: including file: a.h
|
As you can see, for each translation unit the same include files are opened. The idea behind UB is to reduce the amount of redundant work (i.e. including files), by generating a unity translation unit that simply includes all of the other translation units.
So we add a file ub.cpp with the following content
1
2
|
#include <t1.cpp>
#include <t2.cpp>
|
remove the translation units t1.cpp and t2.cpp from the project (or we will hit multiple defined symbols) and compile. The compiler output this time is
1
2
3
4
5
6
|
Compiling...
ub.cpp
Note: including file: t1.cpp
Note: including file: b.h
Note: including file: a.h
Note: including file: t2.cpp
|
The careful reader will spot four includes now compared to four includes in the original example. So, did we win anything? Well yes, the first time b.h is encountered it is opened and parsed which leads to the inclusion of a.h. The next time we hit b.h it is not parsed anymore and so isn’t a.h.
For major software projects with deep nested include hierarchies and a plenty of translation units, we’d spare the compiler a lot of work by using unity builds.
Unfortunately, the manual procedure taken above is suitable only for the smallest software projects. So, we need a way of automating unit build generation. I’ll provide a solution for those who use CMake in the next section.
Automating Unity Builds
Unity Builds can easily be automated with CMake. Here’s a function that, when called, enables unity builds for a bunch of provided translation units.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
function
(enable_unity_build UB_SUFFIX SOURCE_VARIABLE_NAME)
set
(files ${${SOURCE_VARIABLE_NAME}})
# Generate a unique filename for the unity build translation unit
set
(unit_build_file ${CMAKE_CURRENT_BINARY_DIR}
/ub_
${UB_SUFFIX}.cpp)
# Exclude all translation units from compilation
set_source_files_properties(${files} PROPERTIES HEADER_FILE_ONLY
true
)
# Open the ub file
FILE(WRITE ${unit_build_file}
"// Unity Build generated by CMake\n"
)
# Add include statement for each translation unit
foreach(source_file ${files} )
FILE( APPEND ${unit_build_file}
"#include <${CMAKE_CURRENT_SOURCE_DIR}/${source_file}>\n"
)
endforeach(source_file)
# Complement list of translation units with the name of ub
set
(${SOURCE_VARIABLE_NAME} ${${SOURCE_VARIABLE_NAME}} ${unit_build_file} PARENT_SCOPE)
endfunction(enable_unity_build)
|
To use it simply invoke enable_unity_build
1
2
3
4
5
6
7
8
9
10
11
|
set
(INCLUDES inc
/ub/a
.h inc
/ub/b
.h)
set
(SRCS src
/t1
.cpp src
/t2
.cpp src
/main
.cpp)
# Comment to disable unit build
enable_unity_build(UnityBuildDemo SRCS)
add_executable(
UnityBuildDemo
${INCLUDES}
${SRCS}
)
|
Performance Improvements
Here is a diagram showing the performance improvement a set of random libraries developed at company-site. Each column corresponds to a library or unit test project for a library. The performance was measured on a Quad-Core PC having parallel project builds enabled.
Best Practices
Continuous Integration
In contrast to the opinion of The Magic of Unity Builds, we do not use UB for our daily developer work, as minor changes quite frequently cause a rebuild of the entire unity. Instead, Unity Builds are used as the compilation method of choice for our continuous integration servers. Cut down compilation time helps in getting feedback more quickly from automated test suites.
Split Unities
Having to many compilation units collected in a single unity build file can lead to compilation problems such as out of memory issues. When such an issue occurs, you should split the monolithic unity into separate smaller unities. Here’s how this can be done usingenable_unity_build
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
set
(INCLUDES inc
/ub/a
.h inc
/ub/b
.h)
set
(SRCS_A src
/t1
.cpp)
set
(SRCS_B src
/t2
.cpp src
/main
.cpp)
# Comment to disable unit build
enable_unity_build(A SRCS_A)
enable_unity_build(B SRCS_B)
add_executable(
UnityBuildDemo
${INCLUDES}
${SRCS_A} ${SRCS_B}
)
|
Most of the times, however, a single Unity Build file per project works without any problems.
Code Quality
Introducing unity builds can reveal code quality issues such as duplicating code in translation units. Since ‘all’ translation units are collected in a single unity, multiple, previously equally named but separate type declarations/definitions will become a single type and thus lead to ‘multiple defined symbol’ errors.