Multicore Storage Allocation

by Charles Leiserson

When multicore-enabling a C/C++ application, it's common to discover that malloc()(or new) is a bottleneck that limits the speedup your parallelized application can obtain.  This article explains the four basic problems that a good parallel storage allocator solves:

 

  1. thread safety,
  2. overhead,
  3. contention,
  4. memory drift.

Thread safety

Basic storage allocators are not thread safe, although recent efforts have started to remedy this problem for many concurrency platforms.  In other words, improper behavior due to races on the storage allocator's internal data structures can result from two parallel threads attempting allocate or deallocate at the same time.  When threads have unrestricted access to the storage allocator, as shown below, they may end up "stomping on each others' toes," leading to anomalous behavior.

2

The simple solution to this problem is for applications to acquire a mutex (mutual exclusion) lock on the allocator before calling malloc() or free(), as illustrated below, which lets only one thread access the allocator's internal data structures at a time.

3

 

If the storage allocator is thread safe, the locking protocol is incorporated into the logic of the storage allocator itself.

Overhead and contention

Two problems may arise when an allocator is made thread safe by locking.  The first is that allocation and deallocation may now be slower due to the overhead of locking.  The second is that contention may arise in accessing the storage allocator, which can slow down the application and limit its scalability.  Contention may not be a big problem for 2 or 4 cores, but as Moore's Law brings us dozens and even hundreds of cores per chip, contention can threaten scalability.

Both problems can be solved using a distributed allocator, which provides a local storage pool per thread, as illustrated below.

4

A distributed allocator allows allocation and deallocation to run out of the local storage pool most of the time.  In the uncommon case that a thread's local pool is exhausted, the thread can obtain additional storage, typically in large blocks, from the global pool.  The contention problem is solved, because threads only rarely access the global pool.  The overhead problem is solved as well, because no locking is needed to access the local pool.

Memory drift

Unfortunately, local pools introduce yet another problem, especially in concurrency platforms where storage is actively shared among threads or which load-balance a computation across the threads.  One thread A may continually allocate storage out of its local pool and pass it off to another thread B which frees it into its local pool.  When thread A's local pool runs out, it allocates more storage from the global pool.  This storage is passed to B, which proceeds to free it into its local pool.  Over time, B's local pool grows unboundedly, creating something akin to a memory leak, where the virtual-memory footprint of the application continues to grow.

This memory drift problem can be solved in two ways.  One solution is for a thread whose local pool becomes too large to return some of its storage to the global pool.   The other is for all threads to return storage to the thread pool where the storage was allocated.  Either method can be implemented with low overhead, and both provide satisfactory solutions to the memory drift problem.                           

Conclusion    5

There are other problems that can arise with parallel storage allocators.  For example, false sharing is a particularly pernicious problem, where two threads access independent blocks of storage that happen to lie on the same cache line, leading to a thrashing of the cache coherency protocol in the processor.  A storage allocator that fails to respect cache line boundaries and gives blocks of storage that share the same cache line to different threads may induce false sharing, which is hard to detect, because the logic of the code shows that the threads are accessing independent locations.

Two examples of parallel storage allocators include Hoard, written by Emery Berger of the University of Massachusetts, and the Miser allocator, distributed by Cilk Arts as part of our Cilk++ distribution.  (More on Miser in an upcoming post - stay tuned!)

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
资源包主要包含以下内容: ASP项目源码:每个资源包中都包含完整的ASP项目源码,这些源码采用了经典的ASP技术开发,结构清晰、注释详细,帮助用户轻松理解整个项目的逻辑和实现方式。通过这些源码,用户可以学习到ASP的基本语法、服务器端脚本编写方法、数据库操作、用户权限管理等关键技术。 数据库设计文件:为了方便用户更好地理解系统的后台逻辑,每个项目中都附带了完整的数据库设计文件。这些文件通常包括数据库结构图、数据表设计文档,以及示例数据SQL脚本。用户可以通过这些文件快速搭建项目所需的数据库环境,并了解各个数据表之间的关系和作用。 详细的开发文档:每个资源包都附有详细的开发文档,文档内容包括项目背景介绍、功能模块说明、系统流程图、用户界面设计以及关键代码解析等。这些文档为用户提供了深入的学习材料,使得即便是从零开始的开发者也能逐步掌握项目开发的全过程。 项目演示与使用指南:为帮助用户更好地理解和使用这些ASP项目,每个资源包中都包含项目的演示文件和使用指南。演示文件通常以视频或图文形式展示项目的主要功能和操作流程,使用指南则详细说明了如何配置开发环境、部署项目以及常见问题的解决方法。 毕业设计参考:对于正在准备毕业设计的学生来说,这些资源包是绝佳的参考材料。每个项目不仅功能完善、结构清晰,还符合常见的毕业设计要求和标准。通过这些项目,学生可以学习到如何从零开始构建一个完整的Web系统,并积累丰富的项目经验。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值