判斷類Filter,該類與parser 配合使用,具體用法見下面例證:
1.TagNameFilter
TabNameFilter是最容易理解的一个Filter,根据Tag的名字进行过滤。
Parser parser = new Parser( URL );
NodeFilter filter = new TagNameFilter ("DIV");
NodeList nodes = parser.extractAllNodesThatMatch(filter);
if(nodes!=null) {
for (int i = 0; i < nodes.size(); i++) {
Node textnode = (Node) nodes.elementAt(i);
message("getText:"+textnode.getText());
System.out.println(textnode);
message("=================================================");
}
}
結果:
Tag (294[4,0],313[4,19]): div id="top_main"
Txt (313[4,19],319[5,4]): \n
Tag (319[5,4],339[5,24]): div id="logoindex"
Txt (339[5,24],349[6,8]): \n
Rem (349[6,8],360[6,19]): 这是注释
Txt (360[6,19],391[8,0]): \n 白泽居-www.baizeju.com\n
Tag (391[8,0],424[8,33]): a href="http://www.baizeju.com"
Txt (424[8,33],443[8,52]): 白泽居-www.baizeju.com
End (443[8,52],447[8,56]): /a
Txt (447[8,56],453[9,4]): \n
End (453[9,4],459[9,10]): /div
Txt (459[9,10],486[11,0]): \n 白泽居-www.baizeju.com\n
End (486[11,0],492[11,6]): /div
getText:div id="top_main"
=================================================
Tag (319[5,4],339[5,24]): div id="logoindex"
Txt (339[5,24],349[6,8]): \n
Rem (349[6,8],360[6,19]): 这是注释
Txt (360[6,19],391[8,0]): \n 白泽居-www.baizeju.com\n
Tag (391[8,0],424[8,33]): a href="http://www.baizeju.com"
Txt (424[8,33],443[8,52]): 白泽居-www.baizeju.com
End (443[8,52],447[8,56]): /a
Txt (447[8,56],453[9,4]): \n
End (453[9,4],459[9,10]): /div
getText:div id="logoindex"
=================================================
2.HasChildFilter
修改代码:
Parser parser = new Parser( URL );
NodeFilter innerFilter = new TagNameFilter ("DIV");
NodeFilter filter = new HasChildFilter(innerFilter);
NodeList nodes = parser.extractAllNodesThatMatch(filter);
输出结果:
getText:body
=================================================
getText:div id="top_main"
=================================================
可以看到,输出的是两个有DIV子Tag的Tag节点。(body有子节点DIV "top_main","top_main"有子节点"logoindex"。
注意HasChildFilter还有一个构造函数:
public HasChildFilter (NodeFilter filter, boolean recursive)
如果recursive是false,则只对第一级子节点进行过滤。比如前面的例子,body和top_main都是在第一级的子节点里就有DIV节点,所以匹配上了。如果我们用下面的方法调用:
NodeFilter filter = new HasChildFilter( innerFilter, true );
输出结果:
getText:html xmlns="http://www.w3.org/1999/xhtml"
=================================================
getText:body
=================================================
getText:div id="top_main"
=================================================
可以看到输出结果中多了一个html xmlns="http://www.w3.org/1999/xhtml",这个是整个HTML页面的节点(根节点),虽然这个节点下直接没有DIV节点,但是它的子节点body下面有DIV节点,所以它也被匹配上了。