java怎么实现并行,Java顺序实现比并行实现快4倍

最新推荐文章于 2024-02-23 13:02:02 发布

weixin_39960319

最新推荐文章于 2024-02-23 13:02:02 发布

阅读量252

点赞数

文章标签：并行计算多核处理器缓存优化线程池执行效率

最大的开销是启动和停止线程所花费的时间.如果我将数组的大小从10000减少到10,则花费的时间大约相同.

如果保留线程池,并为每个线程分配工作量以写入本地数据集,则在具有6个内核的计算机上,速度要快4倍.

import java.util.ArrayList;

import java.util.List;

import java.util.concurrent.*;

public class ParallelImplementationOptimised {

static final int numberOfThreads = Runtime.getRuntime().availableProcessors();

final ExecutorService exec = Executors.newFixedThreadPool(numberOfThreads);

private int numberOfCells;

public ParallelImplementationOptimised(int numberOfCells) {

this.numberOfCells = numberOfCells;

}

public void update() throws ExecutionException, InterruptedException {

List> futures = new ArrayList<>();

for(int thread = 0; thread < numberOfThreads; thread++) {

final int threadId = thread;

futures.add(exec.submit(new Runnable() {

@Override

public void run() {

int num = numberOfCells / numberOfThreads;

double[] h0 = new double[num],

h1 = new double[num],

h2 = new double[num],

h3 = new double[num],

h4 = new double[num],

h5 = new double[num],

h6 = new double[num],

h7 = new double[num],

h8 = new double[num],

h9 = new double[num];

for (int i = 0; i < num; i++) {

h0[i] = h0[i] + 1;

h1[i] = h1[i] + 1;

h2[i] = h2[i] + 1;

h3[i] = h3[i] + 1;

h4[i] = h4[i] + 1;

h5[i] = h5[i] + 1;

h6[i] = h6[i] + 1;

h7[i] = h7[i] + 1;

h8[i] = h8[i] + 1;

h9[i] = h9[i] + 1;

}

}

}));

}

for (Future> future : futures) {

future.get();

}

}

public static void main(String[] args) throws ExecutionException, InterruptedException {

ParallelImplementationOptimised si = new ParallelImplementationOptimised(10);

long start = System.currentTimeMillis();

for (int i = 0; i < 10000; i++) {

if(i % 1000 == 0) {

System.out.println(i);

}

si.update();

}

long stop = System.currentTimeMillis();

System.out.println("Time: " + (stop - start));

si.exec.shutdown();

}

}

SequentialImplementation 3.3秒.

并行实施优化0.8秒.

您似乎正在同一高速缓存行上写入同一数据.这意味着数据必须经过L3高速缓存未命中,这比访问L1高速缓存要花费20倍的时间.我建议您尝试完全分开的数据结构,这些数据结构至少间隔128个字节,以确保您不会碰到同一条缓存行.

注意：即使您打算完成覆盖整个缓存行,x64 CPU也会首先拉入缓存行的先前值.

另一个问题可能是

Why isn’t this 20x slower?

抓住了缓存行的CPU内核可能有两个运行超线程的线程(即,两个线程可以在本地访问数据),并且该CPU可能绕了几次循环,然后才将缓存行丢失给了另一个CPU内核.要求它.这意味着20倍的损失不是在每次访问或每个循环上都出现,而是经常使您获得慢得多的结果.

weixin_39960319

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。