解析html文件,采用Jar包 htmlparser,filter组合应用。
1.获取新网页的链接。形如<a href ="xxx" target = "_blank">
2.获取图片
阅读(31) | 评论(0) | 转发(0) |
<script type=text/javascript charset=utf-8 src="http://static.bshare.cn/b/buttonLite.js#style=-1&uuid=&pophcol=3&lang=zh"></script> <script type=text/javascript charset=utf-8 src="http://static.bshare.cn/b/bshareC0.js"></script>
1.获取新网页的链接。形如<a href ="xxx" target = "_blank">
点击(此处)折叠或打开
- Parser parser = Parser.createParser(all,"charest");//all为文件,charest为编码
-
- public String getlink(Parser parser) {
- String link="";
- try {
- AndFilter andFilter = new AndFilter(new TagNameFilter("a"),new HasAttributeFilter("target", "_blank"));
- NodeList nodeList = parser.extractAllNodesThatMatch(andFilter);
- for (int i = 0; i < nodeList.size(); i++) {
- Node node = nodeList.elementAt(i);
- if (!node.toPlainTextString().equals("")) {
- link+=node.toPlainTextString();
- link+="@";
- }
- }
- } catch (Exception e) {
- // TODO: handle exception
-
- return "";
- }
- return link;
- }
2.获取图片
点击(此处)折叠或打开
- public void pic(Parser parser,String num,String name,String category) {
- String src="";
- String alt="";
- String datasrc = "";
- try {
- TagNameFilter tagNameFilter = new TagNameFilter("img");
- NodeList nodeList=parser.extractAllNodesThatMatch(tagNameFilter);
- //System.out.println(nodeList.size());
-
- if (nodeList.size()>0) {
- for (int i = 0; i < nodeList.size(); i++) {
- Tag tagnode=(Tag)nodeList.elementAt(i);
- src = tagnode.getAttribute("src");
- //System.out.println("src :"+src+"e");
-
- alt = tagnode.getAttribute("alt");
- //System.out.println("alt :"+alt+"e");
-
- datasrc = tagnode.getAttribute("data-src");
- //System.out.println("data-src:"+datasrc+"e");
-
- }
- }
-
- } catch (ParserException e) {
- // TODO Auto-generated catch block
-
- e.printStackTrace();
- return ;
- }
- parser.reset();
- }
相关热门文章
- 至尊体验 卡盟站、电子商务平...
- 解析linux根文件系统的挂载过...
- 企业建设网站的好处有哪些呢?...
- http://sx.baoxianzp.com 绍兴...
- 全新253个多ip站群服务器,续...
给主人留下些什么吧!~~
评论热议