Using Threads for Parallelism

最新推荐文章于 2022-05-31 21:31:53 发布

zhangyubingcatherine

最新推荐文章于 2022-05-31 21:31:53 发布

阅读量626

点赞数

分类专栏： Computer Systems 文章标签： Parallelism Threads Concurrent Programs Parallel Programs Sequential Programs

本文链接：https://blog.csdn.net/zhangyubingcatherine/article/details/17679431

版权

Computer Systems 专栏收录该内容

33 篇文章 0 订阅

订阅专栏

Thus far in our study of concurrency, we have assumed concurrent threads executing on uniprocessor systems. However, many modern machines have multi-core processors. Concurrent programs often run faster on such machines because the operating system kernel schedules the concurrent threads in parallel on multiple cores, rather than sequentially on a single core.

A sequential program is written as a single logical flow. A concurrent program is written as multiple concurrent flows. A parallel program is a concurrent program running on multiple processors. To understand some important aspects of parallel programming, a very simple program is represented below.

#include <iostream>
#include <stdlib.h>
#include <pthread.h>
using namespace std;
const int MAXTHREADS 32;

void *sum(void *vargp);	// Thread routine

// Global shared variables
long psum[MAXTHREADS];	// Partial sum computed by each thread
long nelems_per_thread;	// Number of elements summed by each thread

int main(int argc, int **argv)
{
		// Get input arguments
		if (argc != 3) {
				cout << "Usage: " << argv[0] 
				     << " <nthreads> <log_nelem>" << endl;
				return 0;
		}	
		long nthreads = atoi(argv[1]);
		long log_nelems = atoi(argv[2]);
		long nelems = (1L << log_nelems);
		nelems_per_thread = nelems / nthreads;
		
		// Create peer threads and wait for them to finish
		pthread_t tid[MAXTHREADS];
		int myid[MAXTHREADS];		
		for (long i = 0; i != nthreads; ++i) {
				myid[i] = i;
				pthread_create(&tid[i], NULL, sum, &myid[i]);
		}
		for (long i = 0; i != nthreads; ++i)
				pthread_join(tid[i], NULL);
		
		// Add up the partial sums computed by each thread
		long result = 0;
		for (long i = 0; i != nthreads; ++i)
				result += psum[i];
		
		// Check final answer
		if (result != (nelems*(nelems-1))/2)
				cout << "Error: result=" << result << endl;
		return 0;
}

The code above shows how we might implement this simple parallel sum algorithm. Notice that the main thread passes a small integer to each peer thread that serves as a unique thread ID. Each peer thread will use its thread ID to determine which portion of the sequence it should work on. This idea of passing a small unique thread ID to the peer thread is a general technique that is used in many parallel applications.

The thread function that each peer thread executes is showed below.

void *sum(void *vargp)
{
		int myid = *((int *)vargp);			// Extract the thread ID 
		long start = myid * nelems_per_thread;	// Start element index
		long end = start + nelem_per_thread;		// End element index
		
		long sum = 0;
		for (long i = start; i != end; ++i)
				sum += i;
		psum[myid] = sum;
		return NULL;
}

Notice that we are careful to give each peer thread a unique memory location to update, and thus it is not necessary to synchronize access to the psum array with semaphore mutexes. The only necessary synchronization in this particular case is that the main thread must wait for each of the children to finish so that it knows that each entry in psum is valid.