Algorithm 1 CUDA_BFS(Graph G(V, E), Source Vertex S)
{
1. Create vertex array Va from all vertices in G(V, E);
2. Create edge array Ea from all edges in G(V, E);
3. Create Frontier array Fa, visited array Xa and cost array Ca of size V;
4. Initialize Fa, Xa to false and Ca to INFINITE;
5. Fa[s]=true;
6. Ca[s]=0;
7. While Fa not Empty do
8. for each vertex V in parallel do
9. Invoke CUDA_BFS_KERNEL(Va, Ea, Fa, Xa, Ca) on the grid.
10. end for
11. end while
}
Algorithm 2 CUDA_BFS_KERNEL(Va, Ea, Fa, Xa, Ca)
{
1. tid=getThreadID;
2. if Fa[tid] then
3. Fa[tid]=false;Xa[tid]=true;
4. for all neighbors nid of tid do
5. if NOT Xa[nid] then
6. Ca[nid]=Ca[nid]+1
7. Fa[nid]=true;
8. end if
9. end for
10.end if
}
这个算法
Algorithm 1 CUDA_BFS(Graph G(V, E), Source Vertex S)
{
1. Create vertex array Va from all vertices in G(V, E);
2. Create edge array Ea from all edges in G(V, E);
3. Create Frontier array Fa, visited array Xa and cost array Ca of size V;
4. Initialize Fa, Xa to false and Ca to INFINITE;
5. Fa[s]=true;
6. Ca[s]=0;
7. While Fa not Empty do
8. for each vertex V in parallel do
9. Invoke CUDA_BFS_KERNEL(Va, Ea, Fa, Xa, Ca) on the grid.
10. end for
11. end while
}
Algorithm 2 CUDA_BFS_KERNEL(Va, Ea, Fa, Xa, Ca)
{
1. tid=getThreadID;
2. if Fa[tid] then
3. Fa[tid]=false;Xa[tid]=true;
4. for all neighbors nid of tid do
5. if NOT Xa[nid] then
6. Ca[nid]=Ca[nid]+1
7. Fa[nid]=true;
8. end if
9. end for
10.end if
}
这个算法实现了level synchronization, 一个顶点一个线程,算法复杂度是图的直径(就是从源到叶子的最长距离)即O(diameter).
其中数据结构的设计和存储上可以做很多优化。。。这图要是动态地根据一定规则自动生成,那么数据结构的动态性会影响到性能么?多大程度上?问问自己,动动手动动脑子~
学习自:Accelerating large graph algorithms on the GPU using CUDA