NekoHTM

最新推荐文章于 2017-08-23 16:34:20 发布

zhou2002

最新推荐文章于 2017-08-23 16:34:20 发布

阅读量810

点赞数

文章标签： string import exception null url class

转自http://jlm0808.blogcn.com/diary,113856063.shtml

package com.sample;

/**
*@author Jerry Chiang
*@version 1.0
*/

import org.cyberneko.html.parsers.DOMParser;
import org.w3c.dom.Document;
import org.w3c.dom.Node;

public class TestHTMLDOM {

private static String test_url1 = "http://www.baidu.com";
private static String str1;

//main method
public static void main(String[] argv) throws Exception{
  DOMParser parser = new DOMParser();      //实体化解析器
  parser.parse(test_url1);      //对给定的HTML文档解析
  print(parser.getDocument(), "");    //输出DOM树
  System.out.println("该网页中总字数为："+str1.length());   //统计#text结点的字数
}

//print method
private static void print(Node node, String indent) {
//  System.out.println(indent+node.getNodeName()); //输出结点名字
  if (node.getNodeValue() != null) {
   if("".equals(node.getNodeValue().trim())){
   }
    else{
     if(true){
      System.out.print(indent);
      System.out.println(node.getNodeValue().trim()+node.getNodeName()); //输出结点内容
      if(node.getNodeType() == Node.TEXT_NODE){
       str1 += node.getNodeValue().trim();   //将结点内容赋值给String
      }
     }
    }
  }
  Node child = node.getFirstChild();
  while (child != null) {
   print(child, indent+" ");
   child = child.getNextSibling();
  }
}

}

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

zhou2002

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NekoHTM

转自http://jlm0808.blogcn.com/diary,113856063.shtmlpackage com.sample;/** *@author Jerry Chiang *@version 1.0 */import org.cyberneko.html.parsers.DOMParser;import org.w3c.dom.Document;import or
复制链接

扫一扫