2010 英特尔® 线程挑战赛—Hosoya指数
源码下载
问题描述
Hosoya 指数 Z 是分子中非临近化学键集的数量。如果我们把分子抽象成图,那么分子的原子就是整个图的顶点 (V),原子间的临近化学键就是边 (E)。也就是说,图 Z(G(V,E)) 的 Hosoya 指数是图中的匹配总数,而其中匹配是指不共享顶点的边的子集。
计算给定图的 Hosoya 指数可以通过以下方法完成:对所有 k-匹配中的边子集数进行求和,其中 k-匹配是 k (0 <= k <= |E|) 条边子集的集合。例如,对于六顶点无向完整图 (K6) 的 Hosoya 指数,Z(K6) = 76。
K6 = ({A,B,C,D,E,F} {AB,AC,AD,AE,AF,BC,BD,BE,BF,CD,CE,CF,DE,DF,EF})
k 大小 k-匹配
--------------------------------------------
0 1 {} (0-匹配的大小总为 1)
1 15 { {AB}, {AC}, {AD}, {AE}, {AF}, {BC},{BD}, {BE}, {BF}, {CD}, {CE}, {CF},{DE}, {DF}, {EF} }
2 45 { {AB CD}, {AB CE}, {AB CF}, {AB DE}, {AB DF}, {AB EF},
{AC BD}, {AC BE}, {AC BF}, {AC DE}, {AC DF}, {AC EF},
{AD BC}, {AD BE}, {AD BF}, {AD CE}, {AD CF}, {AD EF},
{AE BC}, {AE BD}, {AE BF}, {AE CD}, {AE CF}, {AE DF},
{AF BC}, {AF BD}, {AF BE}, {AF CD}, {AF CE}, {AF DE},
{BC DE}, {BC DF}, {BC EF}, {BD CE}, {BD CF}, {BD EF},
{BE CD}, {BE CF}, {BE DF}, {BF CD}, {BF CE}, {BF DE},
{CD EF}, {CE DF}, {CF DE} }
3 15 { {AB CD EF}, {AB CE DF}, {AB CF DE},
{AC BD EF}, {AC BE DF}, {AC BF DE},
{AD BC EF}, {AD BE CF}, {AD BF CE},
{AE BC DF}, {AE BD CF}, {AE BF CD},
{AF BC DE}, {AF BD CE}, {AF BE CD} }
>=4 0 --
---------------------------------------------------
总计 76
问题:编写一个多线程应用程序,用来通过文件输入图并计算所输入图的 Hosoya 指数。
输入文件说明:输入文件的文件名作为命令行参数提供给应用程序。该文件有若干行,每行包含图中一条要分析的边。图顶点用“A”到“Z”范围中的三个大写字母表示。每个输入行包含六个大写字母,前三个字母代表边的一个顶点,后三个字母代表该边的另一个顶点。文件结束 (EOF) 表示图输入结束。此输入图是连通的无向图;并且每条边的两个顶点不能重复(例如,不允许使用“AABAAB”)。上述示例不符合此问题的输入文件说明。
输出:将结果输出到标准输出,并且必须给出输入图的 Hosoya 指数。
输入文件示例:(省略氢原子的 2-环丙基丁烷分子)
GHAQRC
KLBMND
QRCMND
ABEYZG
CDFABE
CDFYZG
MNDYZG
输出示例:
The Hosoya index of the input graph is 24
串行算法
观察K6完全图的Hosoya指数。
观察K5完全图的Hosoya指数。
K1 1 10 { {AB}, {AC}, {AD}, {AE}, {BC}, {BD}, {BE}, {CD}, {CE}, {DE}}
对K1的每条边E(V1,V2),遍历K1中不含V1和V2的边构成K2
例如 {AB} + {{CD}, {CE}, {DE}}
{AC} + {{BD}, {BE}, {DE}}
{AD} + {{BC}, {BE}, {DE}}
{AE} + {{BC}, {BD}, {CD}}
{BC} + {{AD}, {AE}, {DE}}
{BD} + {{AC}, {AE}, {CE}}
{BE} + {{AC}, {AD}, {CD}}
{CD} + {{AB}, {AE}, {BE}}
{CE} + {{AB}, {AD}, {BD}}
{DE} + {{AB}, {AC}, {BC}}
这样得到10 * 3 = 30对没有交点的边,实际上K2的数量为15, {AB} + {DE} 和 {DE} + {AB}是相同的,控制配对流程,先将边按较小的节点排序,每条边与排在其后的边配对即可。
{AB} + {{CD}, {CE}, {DE}}
{AC} + {{BD}, {BE}, {DE}}
{AD} + {{BC}, {BE}, {DE}}
{AE} + {{BC}, {BD}, {CD}}
{BC} + {{AD}, {AE}, {DE}}
{BD} + {{AC}, {AE}, {CE}}
{BE} + {{AC}, {AD}, {CD}}
{CD} + {{AB}, {AE}, {BE}}
{CE} + {{AB}, {AD}, {BD}}
{DE} + {{AB}, {AC}, {BC}}
另外一种方法:通过记住当前边集内最大点,与含有大于该点的边配对即可。
{AB} + {{CD}, {CE}, {DE}}
{AC} + {{BD}, {BE}, {DE}}
{AD} + {{BC}, {BE}, {DE}}
{AE} + {{BC}, {BD}, {CD}}
{BC} + {{AD}, {AE}, {DE}}
{BD} + {{AC}, {AE}, {CE}}
{BE} + {{AC}, {AD}, {CD}}
{CD} + {{AB}, {AE}, {BE}}
{CE} + {{AB}, {AD}, {BD}}
{DE} + {{AB}, {AC}, {BC}}
递归求解:将边E加入边集{G},检测E的节点是否在边集{G}中,如果E的节点不在边集{G}中,将E加入到{G},递归加入其它边,否则回溯。
__int64 XGetHosya_Recursion(const XGraph* pGraph, uint8* pMask, int nBeg)
{
__int64 nCount = 0;
XEdge* pCur = pGraph->pStart[nBeg];
XEdge* pEnd = pGraph->pEdge + pGraph->nEdge;
for(; pCur < pEnd; ++pCur)
{
if(pMask[pCur->nEnd] == 0)
{
pMask[pCur->nBeg] = pMask[pCur->nEnd] = 1;
nCount += XGetHosya_Recursion(pGraph, pMask, pCur->nBeg + 1) + 1;
pMask[pCur->nBeg] = pMask[pCur->nEnd] = 0;
}
}
return nCount;
}
为提高效率展开递归。
// 展开递归
__int64 XGetHosya_Mask_Stack(const XGraph* pGraph, uint8* pMask, int nBeg)
{
__int64 nCount = 0;
XEdge* pCur = pGraph->pStart[nBeg];
XEdge* pEnd = pGraph->pStart[pGraph->nNode];
XEdge** pStart = pGraph->pStart;
XEdge** pCurStack = (XEdge**)scalable_aligned_malloc((pGraph->nNode / 2 + 2) *
sizeof(pCurStack[0]), 16);
int nIndex = 1;
pEnd->nBeg = pEnd->nEnd = pGraph->nNode;
pCurStack[0] = pEnd;
while(nIndex > 0)
{
for( ;pCur < pEnd; ++pCur)
{
if(pMask[pCur->nEnd] == 0)
{
pMask[pCur->nBeg] = pMask[pCur->nEnd] = 1;
++nCount;
pCurStack[nIndex++] = pCur;
pCur = pStart[ pCur->nBeg + 1] - 1;
}
}
if(pCur == pEnd)
{
pCur = pCurStack[--nIndex];
pMask[pCur->nBeg] = pMask[pCur->nEnd] = 0;
++pCur;
}
}
scalable_aligned_free(pCurStack);
return nCount;
}
并行算法
使用TBB的Task,进行任务分解,当待处理的节点数较少时停止分解Task。
// HosoyaTask
class CXHosoyaTask: public tbb::task
{
const XGraph* m_pGraph;
uint8* m_pMask;
int m_nBeg;
__int64* 0020m_pCount;
static uint32 ms_nTask;
public:
CXHosoyaTask( const XGraph* pGraph, uint8* pMask, int nBeg, __int64* pCount) :
m_pGraph(pGraph), m_nBeg(nBeg), m_pCount(pCount)
{
m_pMask = (uint8*)scalable_calloc(sizeof(m_pMask[0]), pGraph->nNode + 2);
memcpy(m_pMask, pMask, sizeof(pMask[0]) * (pGraph->nNode + 2));
}
~CXHosoyaTask() { scalable_free(m_pMask); }
tbb::task* execute()
{
tbb::task_list list;
if(m_nBeg + 12 > m_pGraph->nNode)
{
*m_pCount += XGetHosya_Mask_Stack(m_pGraph, m_pMask, m_nBeg);
}
else
{
int nTask = 1;
uint8* pMask = m_pMask;
while(pMask[m_nBeg]) ++m_nBeg;
XEdge* pCur = m_pGraph->pStart[m_nBeg];
XEdge* pEnd = m_pGraph->pEdge + m_pGraph->nEdge;
deque<__int64> deqCnt;
deque<__int64>::iterator i;
for(; pCur < pEnd; ++pCur)
{
if(pMask[pCur->nEnd] == 0)
{
pMask[pCur->nBeg] = pMask[pCur->nEnd] = 1;
if(pCur->nBeg + 12 < m_pGraph->nNode)
{
deqCnt.push_back(1);
list.push_back( *new( allocate_child() ) CXHosoyaTask(m_pGraph,
pMask, pCur->nBeg + 1, &deqCnt.back()));
++nTask;
}
else
{
*m_pCount += XGetHosya_Mask_Stack(m_pGraph, pMask, pCur->nBeg + 1) + 1;
}
pMask[pCur->nBeg] = pMask[pCur->nEnd] = 0;
}
}
set_ref_count(nTask);
spawn_and_wait_for_all(list);
for(i = deqCnt.begin(); i != deqCnt.end(); ++i) *m_pCount += *i;
}
return NULL;
}
};
// 并行版本
__int64 XGetHosya_Mask_Parallel(const XGraph* pGraph)
{
__int64 nCount = 0;
uint8* pMask = (uint8*)scalable_calloc(sizeof(pMask[0]), pGraph->nNode + 2);
CXHosoyaTask& xtask = *new(tbb::task::allocate_root()) CXHosoyaTask(pGraph,
pMask, 0, &nCount);
tbb::task::spawn_root_and_wait(xtask);
scalable_free(pMask);
return nCount;
}
优化工具
Hotspots检测
使用Intel Amplifier的Hotspots检测热点函数,结果如下:
检测结果显示热点函数在XGetHosya_Mask_Stack内部,代码优化的空间较小,必须从算法的角度进行优化。
在文献《The Hosoys Index, Lucas Numbers, and QSPR》中有这样的算法,
Concurrency检测
使用Intel Amplifier的Concurrency检测并行优化的效果,结果如下:
检测结果显示,程序并行度较好,达到了Ideal的水平。