直接使用jsoup提供的例子:
package cn.mliao.myjsoup;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
/**
* Example program to list links from a URL.
*/
public class JsoupTest {
public static void main(String[] args) throws IOException {
Validate.isTrue(args.length == 1, "usage: supply url to fetch");
String url = args[0];
print("Fetching %s...", url);
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
Elements media = doc.select("[src]");
Elements imports = doc.select("link[href]");
print("\nMedia: (%d)", media.size());
for (Element src : media) {
if (src.tagName().equals("img"))
print(" * %s: <%s> %sx%s (%s)",
src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
trim(src.attr("alt"), 20));
else
print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
}
print("\nImports: (%d)", imports.size());
for (Element link : imports) {
print(" * %s <%s> (%s)", link.tagName(),link.attr("abs:href"), link.attr("rel"));
}
print("\nLinks: (%d)", links.size());
for (Element link : links) {
print(" * a: <%s> (%s)", link.attr("abs:href"), trim(link.text(), 35));
}
}
private static void print(String msg, Object... args) {
System.out.println(String.format(msg, args));
}
private static String trim(String s, int width) {
if (s.length() > width)
return s.substring(0, width-1) + ".";
else
return s;
}
}
一. eclipse执行出错:
Exception in thread "main" java.lang.IllegalArgumentException: usage: supply url to fetch
at org.jsoup.helper.Validate.isTrue(Validate.java:45)
at cn.mliao.myjsoup.JsoupTest.main(JsoupTest.java:17)
这个可以通过配置run configurations:
二. 也去命令行执行:
1. class path: . 和jsoup.jar
2. class名要包括包名:cn/mliao/myjsoup/JsoupTest
D:\eclipse_j2ee\workspace\myJsoup\bin>java -cp .;jsoup-1.8.3.jar cn/mliao/myjsoup/JsoupTest http://www.baidu.com
Fetching http://www.baidu.com...
Media: (3)
* img: <http://www.baidu.com/img/bd_logo1.png> 270x129 ()
* img: <http://www.baidu.com/img/baidu_jgylogo3.gif> x (到百度首页)
* script: <http://s1.bdstatic.com/r/www/cache/static/jquery/jquery-1.10.2.min_f
2fb5194.js>
Imports: (11)
* link <http://www.baidu.com/favicon.ico> (shortcut icon)
* link <http://www.baidu.com/content-search.xml> (search)
* link <http://www.baidu.com/img/baidu.svg> (icon)
* link <http://s1.bdstatic.com> (dns-prefetch)
* link <http://t1.baidu.com> (dns-prefetch)
* link <http://t2.baidu.com> (dns-prefetch)
* link <http://t3.baidu.com> (dns-prefetch)
* link <http://t10.baidu.com> (dns-prefetch)
* link <http://t11.baidu.com> (dns-prefetch)
* link <http://t12.baidu.com> (dns-prefetch)
* link <http://b1.bdstatic.com> (dns-prefetch)
Links: (30)
* a: <http://www.baidu.com/> ()
* a: <> (手写)
* a: <> (拼音)
* a: <> (关闭)
* a: <http://www.baidu.com/> (百度首页)
* a: <> (设置)
* a: <https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%
2F> (登录)
* a: <http://news.baidu.com> (新闻)
* a: <http://www.hao123.com> (hao123)
* a: <http://map.baidu.com> (地图)
* a: <http://v.baidu.com> (视频)
* a: <http://tieba.baidu.com> (贴吧)
* a: <https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%
2F> (登录)
* a: <http://www.baidu.com/gaoji/preferences.html> (设置)
* a: <http://www.baidu.com/more/> (更多产品)
* a: <http://news.baidu.com/ns?cl=2&rn=20&tn=news&word=> (新闻)
* a: <http://tieba.baidu.com/f?kw=&fr=wwwt> (贴吧)
* a: <http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&word=&fr=wwwt> (
知道)
* a: <http://music.baidu.com/search?fr=ps&ie=utf-8&key=> (音乐)
* a: <http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1
&cl=2&nc=1&ie=utf-8&word=> (图片)
* a: <http://v.baidu.com/v?ct=301989888&rn=20&pn=0&db=0&s=25&ie=utf-8&word=> (
视频)
* a: <http://map.baidu.com/m?word=&fr=ps01000> (地图)
* a: <http://wenku.baidu.com/search?word=&lm=0&od=0&ie=utf-8> (文库)
* a: <http://www.baidu.com/more/> (更多?)
* a: <http://www.baidu.com/> (把百度设为主页)
* a: <http://www.baidu.com/cache/sethelp/help.html> (把百度设为主页)
* a: <http://home.baidu.com> (关于百度)
* a: <http://ir.baidu.com> (About??Baidu)
* a: <http://www.baidu.com/duty/> (使用百度前必读)
* a: <http://jianyi.baidu.com/> (意见反馈)
D:\eclipse_j2ee\workspace\myJsoup\bin>