多核处理器架构的改良

本文探讨了在多核处理器上优化EEG应用的Design Space Exploration,通过Snipersim进行模拟。通过寻找并行区域,决定核心数量,以及对缓存结构进行精细优化,最终实现了EDAP性能的显著提升。研究发现,4核结构在特定配置下达到最佳EDAP表现,且优化后的架构比默认结构快3.4倍,能耗降低28%。
摘要由CSDN通过智能技术生成

Design Space Exploration: Multi Core  

Instruction

In this assignment, I map an application onto a multi-core X86 platform simulated by Snipersim. The goal is to optimize for Energy-Delay-Area-Product (EDAP). The application I am using is the same EEG application as we used for the single core assignment but this time will look at running the application at a multicore processor. In order to do this I will use Snipersim to simulate the processor.


Step 1: Find the parallel region. 

The first step is to find the parallel region. At first I used the syntax #pragma omp parallel for, but it seems that the the for loop is run for multi times. Maybe because there are some recursive calculations or calculation in one thread needs operands from other threads.Then I tried the section syntax and put for loop into several sections, then it works.

Then I tried to find the parallel region in the main.c at first. But the main.c structure is very simple and I couldn’t find any parallel region that can improve the EDAP sharply. But I found a region can be run parallel, but it can only decrease the delay slightly.  Then I gave two cores for this region,because if gave it more cores, the overheads will exceed the improvement by the parallel.

Then I found that the main.c include the Analysis.h and graphics.h, I think that I have a better chance to find parallel regions in the Analysis.c and graphics.c. In the Analysis.c there is a largefor loop, then I think it can improve the performance a lot if I can parallel this region. After some small change of the codes, it works and reduce the delay sharply, thought it gave us worse EDAP performance, and I

think it can be fixed in the following steps. What’s even better is that the instructions are evenly allocated to different cores, the master core only run a little bit more instructions than other cores. 

Then I worked on graphic.c , fft.c andother documents (we can see some of the efforts in the program comment), but Icouldn’t find any other parallel regions.

    

Step2: decide the number ofcores

Then I tried to decide how many cores to use. I tested 1 core, 4 cores, 8 cores and 16 cores. And we can see the result from Figure 1, the EDAP is increased as the core number increased almost linearly. It seems that more cores means higher EDAP, but it may because that we haven’t optimize the architecture for multi-cores. We cannot draw any conclusion from this figure, so I decided to do some optimization first and hopeful we can get some more meaningful results.

                               

 

Figure 1: EDAP vs core number

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值