[system-track][computing][GPU][Intel HD 530 Gen9] architecture and performance issue

最新推荐文章于 2021-12-12 21:46:33 发布

xcy6666

最新推荐文章于 2021-12-12 21:46:33 发布

阅读量62

点赞数

分类专栏：分布式系统与并行计算

本文链接：https://blog.csdn.net/giantpoplar/article/details/88727760

版权

9 篇文章 1 订阅

订阅专栏

Gen9 arch

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

subclice data port
- SIMD gather and scatter
- can access shared memory
- coalescing scatter read mem // mem access pattern
L3 data cache on GPU
- banked data cache
- highly banked shared memory //bank conflict, OpenCL refer it as work-group local mem, its programmer managed data
- atomic barrier usage, three part ratio is configurable
LLC-shared
* shared between intel HD and graphic //how to use it
* distributed shared cache //coherence overhead? cache ping-pong
sharing DRAM
- zero copy
- bandwidth contention
eDRAM memory side cache or bypass
64 byte data path on many place
- 1 SIMD-16 instruction can source 64byte wide operands from 64byte wide regs,
- 2 such 64byte wide regs read or written from L3 over 64-byte wide data bus
- 3 within L3 data cache , cache line is 64byte wide
- 4 LLC cache’s bus to SoC-shared LLC is also 64 byte wide
EU: flexible SIMD width; 4KB reg file / thread; 28 KB/ EU
16bit float support: mixed accuracy computing
many consistency part might influence performance
same virtual address can be shared seamlessly across device, programmable via SVM in openCL 2.0
- net effect pointer rich data structure can be shared directly between code run on CPU and code on GPUs