The kernel's CPU scheduler has to balance a wide range of objectives. The tasks in the system must be scheduled fairly, with latency for any given task kept within bounds. All of the CPUs in the system should be kept busy if there is enough work to do, but unneeded CPUs should be shut down to reduce power consumption. A task should also run on the CPU that is most likely to have cached the memory that task is using. This patch series from Chen Yu aims to improve how the scheduler handles cache locality for multi-threaded processes.
内核的CPU调度器需要在多种目标之间进行权衡。系统中的任务必须被公平地调度,且其延迟需被控制在合理范围内。如果系统中有足够的工作可做,那么所有CPU都应保持忙碌;反之,空闲的CPU应被关闭以节省能源。此外,任务还应尽量运行在最有可能缓存了其所用数据的CPU上。Chen Yu提交的这组补丁旨在改进调度器在处理多线程进程的缓存局部性问题上的表现。
RAM is fast, but it is still unable to provide data at anything resembling the rate that a CPU can consume it. For this reason, systems are built with multiple layers of cache that are meant to hold frequently used data and make it available more quickly. Reading a value from cache is relatively fast; a read that goes all the way to RAM, instead, can stall a CPU for the time it takes to execute hundreds of instructions. Making effective use of cache is, thus, important for an application to perform well. Well-written applications are implemented with cache behavior in mind, but the kernel has a role to play as well.
内存虽然已经很快,但其供数速率仍远远赶不上CPU的处理速度。因此,系统通常设计有多级缓存,用于存储频繁使用的数据,并加快访问速度。从缓存中读取数据相对较快;而若需从内存中读取,则可能导致CPU等待相当于执行数百条指令的时间。因此,高效利用缓存对于应用程序的性能至关重要。尽管优秀的应用程序通常在设计时就考虑了缓存行为,但内核同样也需要发挥作用。
Each layer of cache is accessible by a different number of CPUs; the closest (L1) cache may be specific to a single CPU, while the subsequent (slower, but often larger) layers of cache will be shared by a group of C