intel历代架构演进3——酷睿微架构

最新推荐文章于 2024-10-13 00:18:30 发布

lingqi1818

最新推荐文章于 2024-10-13 00:18:30 发布

阅读量2.9k

点赞数

分类专栏： linux内核学习

本文链接：https://blog.csdn.net/lingqi1818/article/details/30777565

版权

10 篇文章

订阅专栏

英特尔在2006年引入了酷睿微架构，该架构主要应用在至强3000，3200，5100，5300和7300系列以及奔腾隗锐双核，酷睿2 Extreme，酷睿2四核和酷睿2双核64位系列芯片上。

酷睿微架构引入了一下特性来增强性能和降低能耗：

单线程工作的性能可以和多线程一样好：

• 英特尔宽动态执行（Intel®Wide Dynamic Execution）增强了每个处理器器取指，分发，执行的带宽，可以让4条指令在一个周期内回退

高级智能缓存（Intel®Advanced Smart Cache）从二级缓存到处理器内核提供更高的带宽，并且优化了性能，提升了单线程和多线程应用的扩展性。
- — Large second level cache up to 4 MB and 16-way associativity
- — 优化了多核和单线程的执行环境
- — 256 bit 内部数据路径来提升L2到一级数据缓存的带宽。
智能内存访问（Intel®Smart Memory Access 从内存预取数据，为了数据访问模式并且降低乱序执行引擎的缓存未命中率
- — 硬件预取来降低二级缓存未命中的延迟。
- — 硬件预取来降低一级数据缓存未命中的延迟。
- — 内存消歧来提高效率投机执行执行引擎
先进数字媒体增强（Intel®Advanced Digital Media Boost）improves most 128-bit SIMD instruction with single-cycle throughput and floating-point operations.
- — Single-cycle throughput of most 128-bit SIMD instructions
- — Up to eight floating-point operation per cycle
- — Three issue ports available to dispatching SIMD instructions for execution

下图为该架构的功能和子系统示意图：

前端总线

酷睿2架构的前端总线为intel宽动态执行引擎提供了很多增强：

执行器内核

酷睿微架构的执行器内核是一个超级体系结构，可以乱序处理指令来提升提升每个周期指令整体的执行速率（IPC），执行器内核通过以下特性来提升吞吐量和效率：

Up to six micro-ops can be dispatched to execute per cycle
Up to four instructions can be retired per cycle
Three full arithmetic logical units
SIMD instructions can be dispatched through three issue ports
Most SIMD instructions have 1-cycle throughput (including 128-bit SIMD instructions)
Up to eight floating-point operation per cycle
许多高延迟计算操作被放在硬件管道中来提升整体吞吐量。
使用英特尔智能内存访问来降低数据访问的延迟。