OpenMp Tutorial

5 篇文章 0 订阅

What is openmp

easy multithreading programing for c++.

it is a simple C/C++/Fortran complier extension that allows to add parallelism into existing source code without significantly having to rewrite it.

Example

example for init an array

#include<iostream>
#include<vector>
int main()
{
    vector<int> arr;
    arr.reserve(1000);
    #pargma omp parallel for
    for(int i=0;i<1000;i++)
    {
        arr[i] = i * 2;
    }
    return 0;
}

you can compile it like this

g++ tmp.cpp -fopenmp

if you remove the #pragma lines,the result is still a valid C++ program that runs and dose the expected thing.
if the compiler encounters a #pragma that it dose not support,it will ignore it.

The syntax

the parallel construct

it creates a team of N threads(where N is the number of CPU cores by default) . after the statement, the threads join back into one.

#pragma omp parallel
{
    // code inside thie region runs in parallel
    printf("Hello\n");
}

Loop construct: for

the for construct splits the for-loop so that each thread in the current team handles a different portion of the loop

#pragma omp for
for(int n=0;n<10;++n)
{
    printf(" %d",n);
}

Note:#pragma omp for onlt delegates portions of the loop for different threads in current team. a team is the group of threads excuting the program.At program start,the team only consists the main thread.
To create a new team of threads,you need to specify the parallel keyword

#pragma omp parallel
{
    #pragma omp for
    for(int n=0;n<10;n++) printf(" %d",n)
}

or use

#pragma omp parallel for

you can explicitly specify the number of threads to be created in the team. using num_threads

#pragma omp parallel for num_threads(3)

scheduling

The scheduling algorithm for the for-loop can explicity controlled

default

#pargma omp for schedule(static)

in the dynamic schedule,each thread ask the omp runtime library for an iteration number,then handles it.
the chunk size can also be specified to lessen the number of calls to the runtime library

#pargma omp parallel for schedule(dynamic,3)

the ordered clause

it is possible to force that certain events within the loop happen in a predicted order, using ordered clause

#pargma omp parallel for ordered shcedule(dynamic)
for(int n=0;i<100;i++)
{
    files[n].compress();
    #pragma omp ordered
    send(files[n]);
}

the collapse clause

when you have nested loops.you can use collapse

#pargma omp parallel for collapse(2)
for(int i=0;i<10;i++)
{
    for(int j=0;j<10;j++)
    {
        doSth();
    }
}

section

sometimes,it is handy to indicate that “this and this can run in parallel” the sectiongs is just for that

#pragma omp parallel sections
{
    {
        work1();
    }
    #pragma omp section
    {
        work2();
        work3();
    }
    #pragma omp section
    {
        work4();
    }
}

This code indicates that any of tasks work1,work2+work3,work4 can run in parallel.

Thread-safety

Atomicity

#pragma omp atomic
couter += value;

the atomic keyword in OpenMP specifies that denoted action happens atomically.

atomic read expression

#pragma omp atomic read
var = x;

atomic write expression

#pragma omp atomic write
x = expr;

atomic update expression

#pragma omp atomic update
++x;--x;x++;x--;
+=,-= ...

atomic capture expression

capture expression combine the read and update features

#pragma omp atomic capture
var = x++;

the critical construct

the critical construct restricts the execution of the associated statement / block to a single thread at a time

#pragma omp critical
{
    doSth();
}

Note:the critical section names are global to the entire program.

locks

The openmp runtime library provides a lock type,omp_lock_t in omp.h
the lock type has five manipulator functions

  • omp_init_lock : initializes the lock
  • omp_destory_lock : the lock must be unset before the call
  • omp_set_lock: get the lock
  • omp_unset_lock: release the lock
  • omp_test_lock: attempts to set the lock.if the lock is already set by another thread it returns 0;if it managed to set the lock,it return 1

the flush directive

Even when variables used by threads are supposed to be shared,the compiler may take liberties and optimize them as register variables. This can skew concurrent observations of variable. The flush directive can be used to forbid this.

/*first thread*/
b=1;
#pragma omp flush(a,b)
if(a == 0)
{
    /* critical section*/

}
/*second thread*/
a = 1;
#pragma omp flush(a,b)
if(b==0)
{
   /* critical section*/ 
}

Controlling which data to share between threads

int a,b =0;
#pragma omp parallel for private(a) shared(b)
for(a=0;a<50;++a)
{
    #pragma omp atomic
    b += a;
}

a is private(each thread has their own copy of it) and b is shared(each thread accesses the same variable)

the difference between private and firstprivate
private does not copy the value of the variable that was in the surrounding context.

#include<string>
#include<iostream>
int main()
{
    std::string a = "x",b="y";
    #pragma omp parallel private(a,c) shared(b) num_threads(2)
    {
        a+="k";
        c+= 7;
        std::cout << "A is " << a <<", b is "<< b;
    }

}
// eaquls this below
     OpenMP_thread_fork(2);
     {                  // Start new scope
         std::string a; // Note: It is a new local variable.
         int c;         // This too.
         a += "k";
         c += 7;
         std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
     }                  // End of scope for the local variables
     OpenMP_join();

If you actually need a copy of the original value, use the firstprivate clause instead.

#include <string>
#include <iostream>

int main()
{
    std::string a = "x", b = "y";
    int c = 3;

    #pragma omp parallel firstprivate(a,c) shared(b) num_threads(2)
    {
        a += "k";
        c += 7;
        std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
    }
}

Execution synchronization

the barrier directive and the nowait clause

The barrier directive causes threads encoutering the barrier to wait until all the other threads in the same team have encountered the barrier.

#pragma omp parrllel
{
    // all threads execute this
    SomeCode();
    #pragma omp barrier
    // all threads execute this,but not before all threads have finished executing SomeCode()
    SomeMoreCode();
}

Note:there is an implicit barrier at the end of each parallel block,at the end of each section for statement

#pragma omp parallel
{
    #pragma omp for
    for(int n=0; n<10; ++n) Work();

    // This line is not reached before the for-loop is completely finished
    SomeMoreCode();
}

// This line is reached only after all threads from
// the previous parallel block are finished.
CodeContinues();

#pragma omp parallel
{
    #pragma omp for nowait
    for(int n=0; n<10; ++n) Work();

    // This line may be reached while some threads are still executing the for-loop.
    SomeMoreCode();
}

// This line is reached only after all threads from
// the previous parallel block are finished.
CodeContinues();

the single and master constructs

The single construct specifies that the given statement/block is executed by only one thread. It is unspecified which thread. Other threads skip the statement/block and wait at an implicit barrier at the end of the construct.

#pragma omp parallel
{
    Work1();
    #pragma omp single
    {
        Work2();
    }
    Work3();
}

The master construct is similar, except that the statement/block is run by the master thread, and there is no implied barrier; other threads skip the construct without waiting.

static const char* FindAnyNeedle(const char* haystack, size_t size, char needle)
{
    const char* result = haystack+size;
    #pragma omp parallel
    {
        unsigned num_iterations=0;
        #pragma omp for
        for(size_t p = 0; p < size; ++p)
        {
            ++num_iterations;
            if(haystack[p] == needle)
            {
                #pragma omp atomic write
                result = haystack+p;
                // Signal cancellation.
                #pragma omp cancel for
            }
        // Check for cancellations signalled by other threads:
        #pragma omp cancellation point for
        }
    // All threads reach here eventually; sooner if the cancellation was signalled.
    printf("Thread %u: %u iterations completed\n", omp_get_thread_num(), num_iterations);
    }
    return result;
}

Loop nesting

this code will not do the excepted thing

#pragma omp parallel for
for(int y=0; y<25; ++y)
{
    #pragma omp parallel for
    for(int x=0; x<80; ++x)
    {
        tick(x,y);
    }
}

solution

#pragma omp parallel for collapse(2)
for(int y=0; y<25; ++y)
{
    for(int x=0; x<80; ++x)
    {
        tick(x,y);
    }
}

Readmore

http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
https://en.wikipedia.org/wiki/OpenMP

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值