Jsoup第一关

一、何为Jsoup

        jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据.

二、jsoup例子


1.新建一个java项目并导入jar包,jsoup-1.8.1.jar。
2.编写代码
package cn.sp;


import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 * a example program to list links from URL
 * @author pc
 *
 */
public class ListLinks {
	public static void main(String[] args) throws Exception {
		Validate.isTrue(args.length == 1, "usage:supply url to fetch");
		String url =args[0];
		print("Fetching %s...",url);
		
		Document doc = Jsoup.connect(url).get();
		Elements links = doc.select("a[href]");
		Elements medias = doc.select("[src]");
		Elements imports = doc.select("link[href]");
		print("\nLinks:(%d)",links.size());
		for (Element link : links) {
			print(" * a:<%s>  (%s)", link.attr("abs:href"),trim(link.text(),35));
		}
		print("\nMedias:(%d)",medias.size());
		for (Element src : medias) {
			if(src.tagName().equals("img")){
				print(" * %s:<%s> %sx%s (%s)", src.tagName(),
						src.attr("abs:src"),src.attr("width"),src.attr("height"),trim(src.attr("alt"),20));
			}else{
				print("* %s:<%s>",src.tagName(),src.attr("abs:src"));
			}
		}
		print("\nImports:(%d)",imports.size());
		for (Element link : imports) {
			print("* %s <%s> (%s)", link.tagName(),link.attr("abs:href"),link.attr("rel"));
		}
	}
	/**
	 * 
	 * @Description: 截取字符串
	 * @param @param s
	 * @param @param i 限定的長度
	 * @param @return   
	 * @author 2YSP
	 * @date 2017年3月8日
	 */
	private static String trim(String s, int i) {
		if(s.length()>i){
			return s.substring(0,i-1)+"";
		}
		return s;
	}

	private static void print(String msg, Object... args) {
		System.out.println(String.format(msg, args));
	}
}
3.设置运行参数为https://www.xxinbao.com,打印台输出如下。
Fetching https://www.xxinbao.com...
Links:(90)
 * a:<https://www.xxinbao.com/login>  (登录)
 * a:<https://www.xxinbao.com/registerMobile>  (注册)
 * a:<https://www.xxinbao.com/front/help/index?typeId=21>  (投资攻略)
 * a:<https://www.xxinbao.com/front/home/helpCenter?id=38>  (帮助中心)
 * a:<https://www.xxinbao.com>  ()
 * a:<https://www.xxinbao.com>  ()
 * a:<http://wpa.qq.com/msgrd?v=3&uin=2811978871&site=qq&menu=yes>  ()
 * a:<https://www.xxinbao.com/>  ()
 * a:<https://www.xxinbao.com/>  (首页)
 * a:<https://www.xxinbao.com/front/invest/fundIndex>  (星计划)
 * a:<https://www.xxinbao.com/front/invest/investHome>  (投资理财)
 * a:<https://www.xxinbao.com/front/invest/investNewHome>  (新手专区)
 * a:<https://www.xxinbao.com/front/debt/debtHome>  (债权转让)
 * a:<https://www.xxinbao.com/front/account/home>  (我的账户)
 * a:<https://www.xxinbao.com/front/home/aboutUs?id=16>  (关于我们)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=468>  ()
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=339>  ()
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=452>  ()
 * a:<https://www.xxinbao.com/front/home/helpCenter?id=40>  (更多>)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=466>  (银行系统维护公告)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=465>  (【荐者有礼】话费满额公告)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=462>  (2017年2月20日星鑫宝系统升级公告)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=460>  (2017年2月17日星鑫宝系统升级公告)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=459>  (银行系统维护公告)
 * a:<https://www.xxinbao.com/front/invest/fundIndex>  (查看更多 >)
 * a:<https://www.xxinbao.com/front/invest/fundList?fundId=46>  (商业保理)
 * a:<https://www.xxinbao.com/front/invest/fundList?fundId=46>  (查看详情)
 * a:<https://www.xxinbao.com/front/invest/fundList?fundId=47>  (商业保理02)
 * a:<https://www.xxinbao.com/front/invest/fundList?fundId=47>  (查看详情)
 * a:<https://www.xxinbao.com/front/invest/fundList?fundId=48>  (商业保理03)
 * a:<https://www.xxinbao.com/front/invest/fundList?fundId=48>  (立即购买)
 * a:<https://www.xxinbao.com/front/invest/investHome>  (查看更多 >)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2043>  (刘先生安心赎楼贷-2017031311期)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2043>  (查看详情)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2042>  (包女士车辆抵押贷-2017031310期)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2042>  (查看详情)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2041>  (万先生车辆抵押贷-2017031309期)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2041>  (查看详情)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2040>  (吴先生车辆抵押贷-2017031308期)
 * a:<https://www.xxinbao.com/front/invest/invest?bidId=2040>  (查看详情)
 * a:<>  (互联网金融研究)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=463>  ()
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=463>  (春暖花开,荐者有礼(话费满额,红包继续))
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=452>  (携手供应链金融,商业保理标的震撼上线)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=432>  (互金开启监管元年 监管思路逐渐明朗)
 * a:<>  (行业资讯)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=461>  ()
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=461>  (三农成网贷下一片蓝海)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=446>  (P2P将迈入下一个十年:细分市场 严控风险集中整治)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=433>  (消费金融向三、四线城市扩张 系补位银行服务)
 * a:<>  (媒体报道)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=468>  ()
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=468>  (星鑫宝2017年2月运营报告)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=457>  (2017,在星河金融遇见更好的自己)
 * a:<https://www.xxinbao.com/front/wealthinfomation/newDetails?id=447>  (星河控股集团28周年华诞暨2017新春联欢晚会圆满落幕)
 * a:<https://www.xxinbao.com/front/homeAction/wealthToolkit?key=4>  (计算器)
 * a:<http://wpa.qq.com/msgrd?v=3&uin=2811978871&site=qq&menu=yes>  (客服)
 * a:<>  (APP)
 * a:<>  (微信)
 * a:<http://wpa.qq.com/msgrd?v=3&uin=2811978871&site=qq&menu=yes>  (在线客服)
 * a:<>  (xxinbao@chngalaxy.com)
 * a:<https://www.xxinbao.com/front/home/helpCenter?id=40>  (网站公告)
 * a:<https://www.xxinbao.com/front/home/helpCenter?id=38>  (理财产品)
 * a:<https://www.xxinbao.com/front/home/aboutUs?id=37>  (业绩报告)
 * a:<https://www.xxinbao.com/front/home/aboutUs?id=16>  (公司介绍)
 * a:<https://www.xxinbao.com/front/home/helpCenter?id=48>  (活动介绍)
 * a:<https://www.xxinbao.com/front/home/aboutUs?id=47>  (社会责任)
 * a:<https://www.xxinbao.com/front/home/aboutUs?id=17>  (管理团队)
 * a:<https://www.xxinbao.com/front/homeAction/wealthToolkit?key=4>  (利息计算器)
 * a:<https://www.xxinbao.com/front/home/aboutUs?id=45>  (联系我们)
 * a:<>  (微信公众号)
 * a:<>  (手机客户端)
 * a:<http://www.chngalaxy.com/>  (星河集团)
 * a:<http://www.g-cre.com/>  (星河商业地产)
 * a:<http://tj.chngalaxy.com/>  (星河地产天津公司)
 * a:<http://gz.chngalaxy.com/>  (星河地产广州公司)
 * a:<http://cz.chngalaxy.com/>  (星河地产常州公司)
 * a:<http://hz.chngalaxy.com/>  (星河地产惠州公司)
 * a:<http://sz.chngalaxy.com/>  (星河地产深圳公司)
 * a:<http://www.topliving.cn/>  (星河第三空间)
 * a:<http://www.gpm.net.cn/index.aspx>  (星河智善生活)
 * a:<http://www.sz-ritzcarlton.com/>  (深圳星河丽思卡尔顿酒店)
 * a:<http://www.dspearl.com/>  (深圳市东山珍珠岛实业有限公司)
 * a:<http://www.wdzj.com/>  (网贷之家)
 * a:<http://www.wdzx.com/>  (网贷中心)
 * a:<http://www.wangdaidongfang.com/>  (网贷东方)
 * a:<http://www.anquan.org>  ()
 * a:<https://credit.szfw.org/CX20170118033065201689.html>  ()
 * a:<https://www.xxinbao.com/front/account/rechargePay>  (确定)
 * a:<https://www.xxinbao.com#>  (取消)
Medias:(47)
* script:<https://www.xxinbao.com/public/javascripts/koala.min.1.5.js>
* script:<https://www.xxinbao.com/public/javascripts/jquery-2.0.js>
* script:<https://www.xxinbao.com/public/javascripts/jquery.json-2.4.min.js>
* script:<https://www.xxinbao.com/public/javascripts/xf_front.js>
* script:<https://www.xxinbao.com/public/javascripts/ajaxfileupload.js>
* script:<https://www.xxinbao.com/public/javascripts/jquery.lazyload.min.js>
* script:<https://www.xxinbao.com/public/javascripts/scrpit.js>
* script:<https://www.xxinbao.com/public/javascripts/superslide.2.1.js>
* script:<https://www.xxinbao.com/public/javascripts/common.js>
* script:<https://www.xxinbao.com/public/javascripts/jqueryswtch.js>
* script:<https://www.xxinbao.com/public/javascripts/scroll.js>
 * img:<https://www.xxinbao.com/public/images/5-121204193R5-50.gif> x ()
 * img:<https://www.xxinbao.com/public/images/weixin.jpg> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/xxbaoApp.png> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ysj.png> x10px ()
 * img:<https://www.xxinbao.com/data/attachments/37d05b31-b55d-4a40-9c87-5515a0a1e38a> x (星河集团)
 * img:<https://www.xxinbao.com/data/attachments/6c12ee20-b54e-49cc-bb50-8a1478c8e237> x (广告)
 * img:<https://www.xxinbao.com/data/attachments/25b962c1-f6cb-4b12-8f69-ab1e5f11dbf0> x (广告)
 * img:<https://www.xxinbao.com/data/attachments/50acd3a8-f6a5-4259-a792-dcfe86711ce4> x (广告)
 * img:<https://www.xxinbao.com/data/attachments/b791b176-5641-4017-8757-f61b0fd3f4f5> x (广告)
 * img:<https://www.xxinbao.com/public/images/newHome/u1.png> 18x ()
 * img:<https://www.xxinbao.com/public/images/newHome/u2.png> 30x ()
 * img:<https://www.xxinbao.com/public/images/newHome/u3.png> 30x ()
 * img:<https://www.xxinbao.com/public/images/newHome/u4.png> 30x ()
 * img:<https://www.xxinbao.com/data/attachments/e69bff7e-0717-4412-9c0b-0620e25b7816> 28x28 ()
 * img:<https://www.xxinbao.com/data/attachments/e69bff7e-0717-4412-9c0b-0620e25b7816> 28x28 ()
 * img:<https://www.xxinbao.com/data/attachments/e69bff7e-0717-4412-9c0b-0620e25b7816> 28x28 ()
 * img:<https://www.xxinbao.com/data/attachments/e69bff7e-0717-4412-9c0b-0620e25b7816> 28x28 ()
 * img:<https://www.xxinbao.com/data/attachments/5720c21f-57dc-473b-92c9-f2e6aadd11b8> 100%x130 (互联网金融研究院)
 * img:<https://www.xxinbao.com/images?uuid=b54fc434-3ea8-4e08-838e-5b82f87804d6> 100%x130 (行业资讯)
 * img:<https://www.xxinbao.com/images?uuid=42629b15-df0d-4761-8995-2b4ab70be489> 100%x130 (行业资讯)
 * img:<https://www.xxinbao.com/public/images/newHome/ur1.png> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ur2.png> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ur3.png> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ur4.png> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ur5.png> 140pxx ()
 * img:<https://www.xxinbao.com/public/images/newHome/xxbaoApp.png> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ysj.png> x10px ()
 * img:<https://www.xxinbao.com/public/images/weixin.jpg> x ()
 * img:<https://www.xxinbao.com/public/images/newHome/ysj.png> x10px ()
* script:<https://www.xxinbao.com/public/javascripts/tab/tab_index.js>
 * img:<https://www.xxinbao.com/public/images/newHome/u9.png> 30x ()
 * img:<https://www.xxinbao.com/public/images/newHome/yx.png> 18x ()
 * img:<https://www.xxinbao.com/public/images/weixin.jpg> 100x100 ()
 * img:<https://www.xxinbao.com/public/images/newHome/xxbaoApp.png> 100x100 ()
* script:<https://static.anquan.org/static/outer/js/aq_auth.js>
 * img:<https://www.xxinbao.com/public/images/cert.png> x ()
Imports:(5)
* link <https://www.xxinbao.com/public/images/favicon.ico> (shortcut icon)
* link <https://www.xxinbao.com/public/stylesheets/site.css> (stylesheet)
* link <https://www.xxinbao.com/public/stylesheets/style.css> (stylesheet)
* link <https://www.xxinbao.com/public/stylesheets/reset.css> (stylesheet)
* link <https://www.xxinbao.com/public/stylesheets/temp.css> (stylesheet)


      
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值