c语言rand多线程慢,多线程random_r比单线程版本慢

最新推荐文章于 2021-09-24 17:57:36 发布

NT xing

最新推荐文章于 2021-09-24 17:57:36 发布

阅读量271

点赞数

文章标签： c语言rand多线程慢

一个非常简单的更改以将数据分配到内存中：

struct random_data* rand_states = (struct random_data*)calloc(NTHREADS * 64, sizeof(struct random_data));

char* rand_statebufs = (char*)calloc(NTHREADS*64, PRNG_BUFSZ);

pthread_t* thread_ids;

int t = 0;

thread_ids = (pthread_t*)calloc(NTHREADS, sizeof(pthread_t));

/* create threads */

for (t = 0; t < NTHREADS; t++) {

initstate_r(random(), &rand_statebufs[t*64], PRNG_BUFSZ, &rand_states[t*64]);

pthread_create(&thread_ids[t], NULL, &thread_run, &rand_states[t*64]);

}

导致我的双核计算机上的运行时间大大缩短。

这将证实它要测试的怀疑-您正在两个单独的线程中对同一高速缓存行中的值进行突变，因此具有高速缓存争用。赫伯·萨特(Herb Sutter)的“机器体系结构-您的编程语言从未告诉过您的话题”值得一看，如果您还有时间不知道的话，他演示了从1:20左右开始的虚假共享。

计算您的缓存行大小，并创建每个线程的数据，使其与之对齐。

将线程的所有数据整理到一个结构中，然后对齐它会更干净一些：

#define CACHE_LINE_SIZE 64

struct thread_data {

struct random_data random_data;

char statebuf[PRNG_BUFSZ];

char padding[CACHE_LINE_SIZE - sizeof ( struct random_data )-PRNG_BUFSZ];

};

int main ( int argc, char** argv )

{

printf ( "%zd\n", sizeof ( struct thread_data ) );

void* apointer;

if ( posix_memalign ( &apointer, sizeof ( struct thread_data ), NTHREADS * sizeof ( struct thread_data ) ) )

exit ( 1 );

struct thread_data* thread_states = apointer;

memset ( apointer, 0, NTHREADS * sizeof ( struct thread_data ) );

pthread_t* thread_ids;

int t = 0;

thread_ids = ( pthread_t* ) calloc ( NTHREADS, sizeof ( pthread_t ) );

/* create threads */

for ( t = 0; t < NTHREADS; t++ ) {

initstate_r ( random(), thread_states[t].statebuf, PRNG_BUFSZ, &thread_states[t].random_data );

pthread_create ( &thread_ids[t], NULL, &thread_run, &thread_states[t].random_data );

}

for ( t = 0; t < NTHREADS; t++ ) {

pthread_join ( thread_ids[t], NULL );

}

free ( thread_ids );

free ( thread_states );

}

与CACHE_LINE_SIZE64：

refugio:$ gcc -O3 -o bin/nixuz_random_r src/nixuz_random_r.c -lpthread

refugio:$ time bin/nixuz_random_r

64

63499495

944240966

real 0m1.278s

user 0m2.540s

sys 0m0.000s

或者，您可以使用两倍的缓存行大小，并使用malloc-额外的填充可确保变异的内存位于单独的行上，因为malloc为16(IIRC)，而不是64字节对齐。

(我将ITERATIONS减少了十倍，而不是拥有笨拙的机器)

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
c语言rand多线程慢,多线程random_r比单线程版本慢

一个非常简单的更改以将数据分配到内存中：struct random_data* rand_states = (struct random_data*)calloc(NTHREADS * 64, sizeof(struct random_data));char* rand_statebufs = (char*)calloc(NTHREADS*64, PRNG_BUFSZ);pthread_t* th...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。