计算机结构优化,计算机结构与程序优化.ppt

最新推荐文章于 2021-11-30 21:50:09 发布

十贰十二

最新推荐文章于 2021-11-30 21:50:09 发布

阅读量139

点赞数

文章标签：计算机结构优化

《计算机结构与程序优化.ppt》由会员分享，可在线阅读，更多相关《计算机结构与程序优化.ppt(116页珍藏版)》请在人人文库网上搜索。

1、计算机结构与程序优化,Introduction to Intel 64 Architectures Optimization,Main Purpose,处理器架构简介 SIMD指令介绍 (SSE /max(A,B),cmp A, B ; Condition jbe L30 ; Conditional branch mov ebx A ; ebx holds X jmp L31 ; Unconditional branch L30: mov ebx, B L31:,xor ebx, ebx ; Clear ebx cmp A, B setle bl ; When ebx = 0 or 1 ; O。

2、R the complement condition sub ebx, 1 ; ebx=11.11 or 00.00 and ebx, A ; ebx=A-B or 0 add ebx, B ; ebx=A or B,Branch Prediction,Spin-Wait and Idle Loops All branch targets should be 16-byte aligned Unroll small loops until the overhead of the branch and induction variable accounts (generally) for les。

3、s than 10%.,Fetch iBUFF_SIZE;i+) sum+=buffi;,Sandy Bridge only,Traversing through pointers,L1D Cache Bank Conflict,L1D Cache Bank Conflict (continue),Minimize Register Spills,Data Layout Optimizations,Pad data structures defined in the source code so that every data element is aligned to a natural o。

4、perand size address boundary,Decomposing an Array,Locality Enhancement,Optimization techniques such as blocking, loop interchange, loop skewing, and packing are best done by the compiler. Optimize data structures either to fit in one-half of the first-level cache or in the second-level cache; turn o。

5、n loop optimizations in the compiler to enhance locality for nested loops,Minimizing Bus Latency,If there is a blend of reads and writes on the bus, changing the code to separate these bus transactions into read phases and write phases can help performance software should favor data access patterns 。

6、that result in higher concentrations of cache miss patterns,Non Temporal Store Bus-traffic,The data transfer rate for bus write transactions is higher if 64 bytes are written out to the bus at a time,Prefetching,First-Level Data Cache Prefetching Avoid Fetch Un-needed Lines Prefetching for 2-Level C。

7、ache,1st-Level DCache Prefetching,Avoid Fetch Un-needed Lines,For L1 Hardware Prefetch,Method 1: Organize the data so consecutive accesses can usually be found in the same 4-KByte page. Access the data in constant strides forward or backward IP Prefetcher. Method 2: Organize the data in consecutive 。

8、lines. Access the data in increasing addresses, in sequential cache lines.,Prefetching for 2-Level Cache,Streamer Loads data or instructions from memory to the second-level cache. To use the streamer, organize the data in blocks of 128 bytes, aligned on 128 bytes,Example of Latency Hiding,Memory Acc。

9、ess Latency and Execution Without Prefetch,Example of Latency Hiding,Memory Access Latency and Execution With Prefetch,Spread Prefetch Instructions,Rearranging PREFETCH instructions may yield a noticeable speedup for the code which stresses the cache resource,Multi-core 2950 Tick 48 bit; max Latency。

10、 15000 tick,Using bit wizardry,Matters Computational-Ideas, Algorithms, Source Code, Jorg Arndt Hackers Delight, Henry S. Warren, Jr. HAKMEM - AIM-239, MIT,QuadCore Intel Core 2 Quad Q9550, 2833 MHz Throughput 3.12 Gbit/s Break out throughput 1090 Tick 288 bit; 212 Tick 48 bit; max Latency 1200 tick。

11、,Look up table,QuadCore Intel Core 2 Quad Q9550, 2833 MHz Throughput 19.1 Gbit/s Break out throughput 280 Tick 288 bit; 68 Tick 48 bit; max Latency 500 tick,A Painless Guide to CRC Error Detection Algorithms Index V3.00, Ross N. williams,Decoder,Viterbi Algorithm Original Program C Optimization SIMD Optimization,Viterbi Algorithm,Viterbi Algorithm,Original Program,QuadCore Intel Core 2 Quad Q9550, 2833 MHz Throughput 11.1 Mbit/s Break out throughput 280K Tick 288 bit; 68K Tick 48 bit; max Latency 300K tick,SIMD Optimization,SIMD Optimization (continue),The End,Thank you。

十贰十二

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
计算机结构优化,计算机结构与程序优化.ppt

《计算机结构与程序优化.ppt》由会员分享，可在线阅读，更多相关《计算机结构与程序优化.ppt(116页珍藏版)》请在人人文库网上搜索。1、计算机结构与程序优化,Introduction to Intel 64 Architectures Optimization,Main Purpose,处理器架构简介 SIMD指令介绍 (SSE /max(A,B),cmp A, B ; Condition j...
复制链接

扫一扫

计算机结构优化,计算机结构与程序优化.ppt

“相关推荐”对你有帮助么？