子图同构算法-VF2(java实现)
最近在项目中用到了子图同构算法VF2,自己查找的时候发现csdn上没有太详细的博客,所以在这里记录一下。内容主要来自一篇论文(A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs)
一、什么是VF2算法
VF2算法是一种子图同构算法,而子图同构我们可以这样定义:
假设有两个图 H=(VH,EH) H=(VH,EH) 和图 G=(V,E) G=(V,E) 子图同构即从H到G存在这样一个函数 f:VH→V 并且 (u,v)∈EH(u,v)∈EH同样成立 f 叫做子图同构的一个映射。
在VF2算法中,可以将查询图表示为 queryGraph,把数据图表示为 targetGraph,引入一个中间状态 state,用于记录我们当前子图同构进行到的状态。 在每一个中间状态中,VF2算法计算要添加到当前状态state的候选节点对 P(S)(分别由查询图和数据图中的一个节点组成),是否满足 可行性规则,如果该节点对可以满足 可行性规则那么就将其添加到当前状态state中并更新当前状态,整个过程递归的进行。
其中 可行性规则一共包含五条,这五条规则确保了子图同构过程的正确性。为了方便代码实现,本文将这五条规则归结为三条,每条通过一个java方法实现。分别是:
-
前驱和后继规则:对于查询节点的每一个前驱节点和后继节点,目标节点一定存在与之对应的。
-
1-look-ahead规则:在查询图中与查询节点邻接,以查询节点为起点/终点并且属于以已经匹配顶点为起点/终点的部分的数量一定小于等于在数据图中对应部分的数量。
-
2-look-ahead 规则:在查询图中查询节点与既不是已匹配顶点也不和以匹配顶点相邻的点也应该与数据图中的对应部分满足第二条规则。
根据这几条规则递归的进行匹配最终可以判断在数据图中是否包含查询图。
二、VF2算法的代码实现(java)
本文递归的判断目标图是否包含查询图:
private boolean matchRecursive(State state, Graph targetGraph, Graph queryGraph){
if (state.depth == queryGraph.nodes.size()){ // Found a match
state.matched = true;
return true;
} else { // Extend the state
ArrayList<Pair<Integer,Integer>> candidatePairs = genCandidatePairs(state, targetGraph, queryGraph);
for (Pair<Integer, Integer> entry : candidatePairs){
if (checkFeasibility(state, entry.getKey(), entry.getValue())){
state.extendMatch(entry.getKey(), entry.getValue()); // extend mapping
if (matchRecursive(state, targetGraph, queryGraph)){ // Found a match
return true;
}
state.backtrack(entry.getKey(), entry.getValue()); // remove the match added before
}
}
}
return false;
}
其中 genCandidatePairs方法用于生成当前状态的所有候选对,checkFeasibility方法用于检查添加词匹配项的可行性。
以下是我们编写的方法用于判断 可行性规则是否满足:
验证 规则一
的方法:
private Boolean checkPredAndSucc(State state, int targetNodeIndex , int queryNodeIndex) {
Node targetNode = state.targetGraph.nodes.get(targetNodeIndex);
Node queryNode = state.queryGraph.nodes.get(queryNodeIndex);
int[][] targetAdjacency = state.targetGraph.getAdjacencyMatrix();
int[][] queryAdjacency = state.queryGraph.getAdjacencyMatrix();
for (Edge e : queryNode.inEdges) {
if (state.core_2[e.source.id] > -1) {
if (targetAdjacency[state.core_2[e.source.id]][targetNodeIndex] == -1){
return false; // not such edge in target graph
} else if (targetAdjacency[state.core_2[e.source.id]][targetNodeIndex] != e.label){
return false; // label doesn't match
}
}
}
for (Edge e : queryNode.outEdges) {
if (state.core_2[e.target.id] > -1) {
if (targetAdjacency[targetNodeIndex][state.core_2[e.target.id]] == -1){
return false; // not such edge in target graph
} else if (targetAdjacency[targetNodeIndex][state.core_2[e.target.id]] != e.label) {
return false; // label doesn't match
}
}
}
return true;
}
验证 规则二
的方法:
private boolean checkInAndOut(State state, int targetNodeIndex , int queryNodeIndex) {
Node targetNode = state.targetGraph.nodes.get(targetNodeIndex);
Node queryNode = state.queryGraph.nodes.get(queryNodeIndex);
int targetPredCnt = 0, targetSucCnt = 0;
int queryPredCnt = 0, querySucCnt = 0;
//入度规则
//目标节点在T1in中的前驱/后继节点数必须大于或者等于查询节点在T2in中的前驱/后继节点数
for (Edge e : targetNode.inEdges){
if (state.inT1in(e.source.id)){
targetPredCnt++;
}
}
for (Edge e : targetNode.outEdges){
if (state.inT1in(e.target.id)){
targetSucCnt++;
}
}
for (Edge e : queryNode.inEdges){
if (state.inT2in(e.source.id)){
queryPredCnt++;
}
}
for (Edge e : queryNode.outEdges){
if (state.inT2in(e.target.id)){
querySucCnt++;
}
}
if (targetPredCnt < queryPredCnt || targetSucCnt < querySucCnt){
return false;
}
// T1out中的目标节点的前驱/后继数必须大于或者等于处于T2out中的查询节点的前驱/后继数
for (Edge e : targetNode.inEdges){
if (state.inT1out(e.source.id)){
targetPredCnt++;
}
}
for (Edge e : targetNode.outEdges){
if (state.inT1out(e.target.id)){
targetSucCnt++;
}
}
for (Edge e : queryNode.inEdges){
if (state.inT2out(e.source.id)){
queryPredCnt++;
}
}
for (Edge e : queryNode.outEdges){
if (state.inT2out(e.target.id)){
querySucCnt++;
}
}
if (targetPredCnt < queryPredCnt || targetSucCnt < querySucCnt){
return false;
}
return true;
}
验证 规则三
的方法:
private boolean checkNew(State state, int targetNodeIndex , int queryNodeIndex){
Node targetNode = state.targetGraph.nodes.get(targetNodeIndex);
Node queryNode = state.queryGraph.nodes.get(queryNodeIndex);
int targetPredCnt = 0, targetSucCnt = 0;
int queryPredCnt = 0, querySucCnt = 0;
for (Edge e : targetNode.inEdges){
if (state.inN1Tilde(e.source.id)){
targetPredCnt++;
}
}
for (Edge e : targetNode.outEdges){
if (state.inN1Tilde(e.target.id)){
targetSucCnt++;
}
}
for (Edge e : queryNode.inEdges){
if (state.inN2Tilde(e.source.id)){
queryPredCnt++;
}
}
for (Edge e : queryNode.outEdges){
if (state.inN2Tilde(e.target.id)){
querySucCnt++;
}
}
if (targetPredCnt < queryPredCnt || targetSucCnt < querySucCnt){
return false;
}
return true;
}
}
下面定义了state状态类,以上的验证可行性方法以及VF2算法的整个流程基于该类编写:
package wip.VF2.core;
import java.io.PrintWriter;
import java.util.HashSet;
import java.util.Scanner;
import wip.VF2.graph.Edge;
import wip.VF2.graph.Graph;
import wip.VF2.graph.Node;
public class State {
public int[] core_1; // stores for each target graph node to which query graph node it maps ("-1" indicates no mapping)
public int[] core_2; // stores for each query graph node to which target graph node it maps ("-1" indicates no mapping)
public int[] in_1; // stores for each target graph node the depth in the search tree at which it entered "T_1 in" or the mapping ("-1" indicates that the node is not part of the set)
public int[] in_2; // stores for each query graph node the depth in the search tree at which it entered "T_2 in" or the mapping ("-1" indicates that the node is not part of the set)
public int[] out_1; // stores for each target graph node the depth in the search tree at which it entered "T_1 out" or the mapping ("-1" indicates that the node is not part of the set)
public int[] out_2; // stores for each query graph node the depth in the search tree at which it entered "T_2 out" or the mapping ("-1" indicates that the node is not part of the set)
public HashSet<Integer> T1in; // nodes that not yet in the partial mapping, that are the destination of branches start from target graph
public HashSet<Integer> T1out; // nodes that not yet in the partial mapping, that are the origin of branches end into target graph
public HashSet<Integer> T2in; // nodes that not yet in the partial mapping, that are the destination of branches start from query graph
public HashSet<Integer> T2out; // nodes that not yet in the partial mapping, that are the origin of branches end into query graph
public HashSet<Integer> unmapped1; // unmapped nodes in target graph
public HashSet<Integer> unmapped2; // unmapped nodes in query graph
public int depth = 0; // current depth of the search tree
public boolean matched = false;
public Graph targetGraph;
public Graph queryGraph;
/**
* Initialize a State
* @param targetGraph The big graph
* @param queryGraph The small graph
*/
public State(Graph targetGraph, Graph queryGraph) {
this.targetGraph = targetGraph;
this.queryGraph = queryGraph;
int targetSize = targetGraph.nodes.size();
int querySize = queryGraph.nodes.size();
T1in = new HashSet<Integer>(targetSize * 2);
T1out = new HashSet<Integer>(targetSize * 2);
T2in = new HashSet<Integer>(querySize * 2);
T2out = new HashSet<Integer>(querySize * 2);
unmapped1 = new HashSet<Integer>(targetSize * 2);
unmapped2 = new HashSet<Integer>(querySize * 2);
core_1 = new int[targetSize];
core_2 = new int[querySize];
in_1 = new int[targetSize];
in_2 = new int[querySize];
out_1 = new int[targetSize];
out_2 = new int[querySize];
// initialize values ("-1" means no mapping / not contained in the set)
// initially, all sets are empty and no nodes are mapped
for (int i = 0 ; i < targetSize ; i++) {
core_1[i] = -1;
in_1[i] = -1;
out_1[i] = -1;
unmapped1.add(i);
}
for (int i = 0 ; i < querySize ; i++) {
core_2[i] = -1;
in_2[i] = -1;
out_2[i] = -1;
unmapped2.add(i);
}
}
public Boolean inM1(int nodeId) {
return (core_1[nodeId] > -1);
}
public Boolean inM2(int nodeId) {
return (core_2[nodeId] > -1);
}
public Boolean inT1in(int nodeId) {
return ((core_1[nodeId] == -1) && (in_1[nodeId] > -1));
}
public Boolean inT2in(int nodeId) {
return ((core_2[nodeId] == -1) && (in_2[nodeId] > -1));
}
public Boolean inT1out(int nodeId) {
return ((core_1[nodeId] == -1) && (out_1[nodeId] > -1));
}
public Boolean inT2out(int nodeId) {
return ((core_2[nodeId] == -1) && (out_2[nodeId] > -1));
}
public Boolean inT1(int nodeId) {
return (this.inT1in(nodeId) || this.inT1out(nodeId));
}
public Boolean inT2(int nodeId) {
return (this.inT2in(nodeId) || this.inT2out(nodeId));
}
public Boolean inN1Tilde(int nodeId) {
return ((core_1[nodeId] == -1) && (in_1[nodeId] == -1) && (out_1[nodeId] == -1));
}
public Boolean inN2Tilde(int nodeId) {
return ((core_2[nodeId] == -1) && (in_2[nodeId] == -1) && (out_2[nodeId] == -1));
}
/**
* Add a new match (targetIndex, queryIndex) to the state
* @param targetIndex Index of the node in target graph
* @param queryIndex Index of the node in query graph
*/
public void extendMatch(int targetIndex, int queryIndex) {
core_1[targetIndex] = queryIndex;
core_2[queryIndex] = targetIndex;
unmapped1.remove(targetIndex);
unmapped2.remove(queryIndex);
T1in.remove(targetIndex);
T1out.remove(targetIndex);
T2in.remove(queryIndex);
T2out.remove(queryIndex);
depth++; // move down one level in the search tree
Node targetNode = targetGraph.nodes.get(targetIndex);
Node queryNode = queryGraph.nodes.get(queryIndex);
for (Edge e : targetNode.inEdges) {
if (in_1[e.source.id] == -1){ // if the note is not in T1in or mapping
in_1[e.source.id] = depth;
if (!inM1(e.source.id)) // if not in M1, add into T1in
T1in.add(e.source.id);
}
}
for (Edge e : targetNode.outEdges) {
if (out_1[e.target.id] == -1){ // if the note is not in T1out or mapping
out_1[e.target.id] = depth;
if (!inM1(e.target.id)) // if not in M1, add into T1out
T1out.add(e.target.id);
}
}
for (Edge e : queryNode.inEdges) {
if (in_2[e.source.id] == -1){ // if the note is not in T2in or mapping
in_2[e.source.id] = depth;
if (!inM2(e.source.id)) // if not in M1, add into T2in
T2in.add(e.source.id);
}
}
for (Edge e : queryNode.outEdges) {
if (out_2[e.target.id] == -1){ // if the note is not in T2out or mapping
out_2[e.target.id] = depth;
if (!inM2(e.target.id)) // if not in M1, add into T2out
T2out.add(e.target.id);
}
}
}
/**
* Remove the match of (targetNodeIndex, queryNodeIndex) for backtrack
* @param targetNodeIndex
* @param queryNodeIndex
*/
public void backtrack(int targetNodeIndex, int queryNodeIndex) {
core_1[targetNodeIndex] = -1;
core_2[queryNodeIndex] = -1;
unmapped1.add(targetNodeIndex);
unmapped2.add(queryNodeIndex);
for (int i = 0 ; i < core_1.length ; i++) {
if (in_1[i] == depth) {
in_1[i] = -1;
T1in.remove(i);
}
if (out_1[i] == depth) {
out_1[i] = -1;
T1out.remove(i);
}
}
for (int i = 0 ; i < core_2.length ; i++) {
if (in_2[i] == depth) {
in_2[i] = -1;
T2in.remove(i);
}
if (out_2[i] == depth) {
out_2[i] = -1;
T2out.remove(i);
}
}
// put targetNodeIndex and queryNodeIndex back into Tin and Tout sets if necessary
if (inT1in(targetNodeIndex))
T1in.add(targetNodeIndex);
if (inT1out(targetNodeIndex))
T1out.add(targetNodeIndex);
if (inT2in(queryNodeIndex))
T2in.add(queryNodeIndex);
if (inT2out(queryNodeIndex))
T2out.add(queryNodeIndex);
depth--;
}
/**
* Print the current mapping
*/
public void printMapping() {
for (int i = 0 ; i < core_2.length ; i++) {
System.out.print("(" + core_2[i] + "-" + i + ") ");
}
System.out.println();
}
/**
* Write state to file
*/
public void writeMapping(PrintWriter writer){
for (int i = 0 ; i < core_2.length ; i++) {
writer.write("(" + core_2[i] + "-" + i + ") ");
}
writer.write("\n");
}
}
三、VF2算法的改进思路
- 查询图的边的匹配顺序可以改进为按照边在数据图上出现的次数从小到大的匹配。(提高边的过滤能力)
- 顶点的匹配顺序可以改进为顶点出现次数少,度数大的优先匹配。(提高顶点的过滤能力)