A primer on mobile systems used for heterogeneous computing

A primer on mobile systems used for heterogeneous computing - Imaginationicon-default.png?t=M1H3https://blog.imaginationtech.com/introduction-mobile-systems-heterogeneous-computing/

In the mobile and embedded market, the design constraints of electronic products can sometimes be seen as tight and contradictory: the market demands higher performance yet lower power consumption, reductions in cost but shorter time-to-market.

These constraints have created a trend for more specialized hardware designs that fit a particular application; if each task is well matched to a functional unit, fewer transistors are wasted and power efficiency is better. As a result, application processors have become increasingly heterogeneous over time, integrating multiple components into a single System-on-Chip (SoC).

The diagram below presents the architecture of a modern SoC. Such a chip typically includes a CPU (with optional multi-core and SIMD capabilities), a GPU for both 3D graphics acceleration and high-performance vector computation, an ISP (Image Signal Processor) for acquiring image sensor data, a VDE (Video Decoder and Encoder) for codec acceleration and an RPU (Radio Processing Unit) for connectivity. Each of these components has its own advantages and combinations of these can be used to implement many applications efficiently.

Today many application developers rely on the CPU to meet the requisite performance requirements of their advanced computational photography and computer vision algorithms. However, these CPU-centric solutions frequently struggle to deliver sustained video-rate processing of high-resolution images, largely due to thermal limits of the devices.

As shown in more detail in the figure below, a CPU combines a small number of cores with a large data cache, optimized for efficient execution of general-purpose control code with low memory latency. The GPU, on the other hand, dedicates its transistors to ALUs (arithmetic logic units) rather than data caches and control flow. This arrangement of hardware enables efficient execution of large unbranched data sets that require many repetitive arithmetic calculations, such as an image processing algorithm operating on many pixels.

Furthermore, because the GPU is designed to run at lower clock speeds than a CPU, offloading image processing workloads from the CPU to GPU can lead to both an increase in performance and a reduction in power consumption and generated heat. The resulting implementation is also likely to be more balanced and also more responsive, as the CPU has more free cycles to respond to the demands of the operating system and user interface.

In the context of mobile and embedded software, heterogeneous computing is the process of combining different types of processing units together to meet an application’s performance requirements within a limited power and thermal budget. By partitioning the application into multiple workloads that can be distributed across the available hardware units, so that each workload is run on the hardware unit capable of executing it most efficiently, the overall performance and power-efficiency of the implementation can be improved.

When partitioning an application, serial tasks should usually be allocated to the CPU, whereas data-parallel tasks are good candidates for offloading to the GPU. If the SoC provides dedicated hardware accelerators such as an ISP or VDE, related tasks such as image de-noising and video playback should usually be allocated to these accelerators in order to maximise power-efficiency.

However, in some cases it may be desirable to implement these tasks in software instead, for example using GPU compute, to trade efficiency for a higher-quality algorithm than may be provided by the fixed-function accelerator. The use of GPU compute is particularly common in the field of computer vision where active research is continually leading to refinements of existing algorithms as well as entirely new vision algorithms. Fast deployment of these algorithms into products requires both programmability and a high-performance compute capability.

Join us next time for an example use case of heterogeneous computing and the existing bandwidth constraints SoCs currently face.

Please let us know if you have any feedback on the materials published on the blog and leave a comment on what you’d like to see next. Make sure you also follow us on Twitter (@ImaginationTech) for more news and announcements from Imagination.

### 回答1: 《a primer on memory consistency pdf》是一本关于内存一致性的入门指南。内存一致性是指在多线程编程中,对共享变量的访问操作的执行顺序保持一致的问题。在多核处理器上运行的并发程序中,由于缓存、指令重排等因素的存在,不同核之间读写共享变量的顺序可能不同,从而导致程序产生错误的结果。 这本《a primer on memory consistency pdf》通过深入浅出的方式介绍了多线程编程中的内存一致性问题及其解决方案。首先,它解释了为什么多线程程序需要关注内存一致性,并对内存模型进行了详细的介绍。然后,它介绍了在不同内存模型下,程序的执行顺序可能发生的变化,并举例说明了这些变化可能带来的问题。 接着,书中详细介绍了一些常见的内存一致性模型,如顺序一致性、弱一致性、松散一致性等,并解释了它们之间的区别和适用场景。同时,还介绍了如何使用同步原语(如锁、原子操作)来保证多线程程序的正确性,以及一些编译器和处理器级别的优化技术对内存一致性的影响。 此外,《a primer on memory consistency pdf》还对一些实际应用中的内存一致性问题进行了讨论,如并发数据结构、并行算法等。通过这些案例,读者可以更好地理解内存一致性问题的实际应用和解决方法。 总之,《a primer on memory consistency pdf》是一本很好的入门指南,通过简明扼要地介绍内存一致性问题及其解决方案,读者可以对多线程编程中的内存一致性有一个较为全面的了解,并且可以在实际应用中进行正确的处理。 ### 回答2: 《内存一致性基础中文版PDF》是一本介绍内存一致性的基础知识的文档。内存一致性是指对于多个并发执行的进程或线程,它们观察到操作内存的结果是一致的。在多核处理器系统中,每个处理器都有自己的缓存,如果缓存数据不一致,就会导致内存不一致性的问题。 这份文档首先介绍了内存一致性的背景和重要性。在多核处理器和并发编程的背景下,内存一致性变得尤为重要。接着,文档详细介绍了不同的内存一致性模型,包括顺序一致性、弱一致性和松散一致性等。每种模型都有自己的特点和适用场景,了解这些模型能够帮助程序员编写高效且正确的并发程序。 文档还介绍了一些实现内存一致性的技术,如缓存一致性协议和内存屏障等。了解这些技术可以帮助开发者更好地理解内存一致性的原理和实现机制。此外,文档还提供了一些实际应用的示例和案例,让读者更好地理解内存一致性在实际开发中的应用和挑战。 《内存一致性基础中文版PDF》是一份权威且易懂的资料,适合对内存一致性感兴趣的学生、程序员和系统工程师阅读。它可以帮助读者建立起对内存一致性概念的全面和深入的理解,为他们编写高效且正确的并发程序提供指导和帮助。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值