____cacheline_aligned
instructs the compiler to instantiate a struct or variable at an address corresponding to the beginning of an L1 cache line, for the specific architecture, i.e., so that it is L1 cache-line aligned. ____cacheline_aligned_in_smp
is similar, but is actually L1 cache-line aligned only when the kernel is compiled in SMP configuration (i.e., with option CONFIG_SMP
). These are defined in file include/linux/cache.h
These definitions are useful for variables (and data structures) that are not allocated dynamically, via some allocator, but are global, compiler-allocated variables (a similar effect can be accomplished by dynamic memory allocators that can allocate memory at specific alignment).
The reason for cache-line aligned variables is to manage the cache-to-cache transfers of these variables, by hardware cache coherence mechanisms, in SMP systems, so that their movement does not implicitly occur when other variables are moved. This is for performance critical code, where one expects contention in the access of variables by multiple cpus (cores). The usual problem one tries to avoid, in this case, is false sharing.
A variable's memory starting at the beginning of a cache line is half the work for this purpose; one also needs to "pack with it" only variables that should move together. An example is an array of variables, where each element of the array is to be accessed by only one cpu (core):
struct my_data {
long int a;
int b;
} ____cacheline_aligned_in_smp cpu_data[ NR_CPUS ];
This kind of definition will require from the compiler (in an SMP configuration of the kernel) that each cpu's struct will begin at a cache line boundary. The compiler will, implicitly, allocate extra space after each cpu's struct, so that the next cpu's struct will begin at a cache line boundary, also.
An alternative is to pad the data structure with a cache line's size of dummy, unused bytes:
struct my_data {
long int a;
int b;
char dummy[L1_CACHE_BYTES];
} cpu_data[ NR_CPUS ];
In this case, only dummy, unused data will be moved unintentionally and those actually accessed by each cpu will only move from cache to memory and vise versa, due to cache capacity misses.