最近在做一个安卓新闻客户端,新闻内容解析遇到了些麻烦,内容多样,文字视频图片都有,发现用Jsoup 不是那么好做,就上网查了查直接用用webview,可是用了之后发现显示空白,而且其实网页的大部分信息是不需要的,只要拿出新闻内容就好,决定继续用Jsoup做。这一篇专门写Jsoup的使用。
下面是这部分对应的HTML代码
<div class="newslist">
<div class="dis" id="tpc_01">
<ul>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373778.shtml" target="_blank" title="韩服王者局:EDG下野双排厂长天使凯瑞Faker" class="list3_2">韩服王者局:EDG下野双排厂长天使凯瑞Faker</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373806.shtml" target="_blank" title="王者VS青铜:这个盲僧天音回旋踢你肯定没见过" class="list3_2">王者VS青铜:这个盲僧天音回旋踢你肯定没见过</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373805.shtml" target="_blank" title="不输盲僧 lol反野第一人:千珏野区逃生教学" class="list3_2">不输盲僧 lol反野第一人:千珏野区逃生教学</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/news/bgzt/" target="_blank" title="八卦杂谈" class="list3_1 green">新闻</a><a href="http://lol.tgbus.com/news/bgzt/373807.shtml" target="_blank" title="AHQ战队蟹老板直播放狠话 S5进四强我就裸奔" class="list3_2">AHQ战队蟹老板直播放狠话 S5进四强我就裸奔</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373804.shtml" target="_blank" title="韩服高端排位集锦:大神玩盖伦也会被狠狠虐" class="list3_2">韩服高端排位集锦:大神玩盖伦也会被狠狠虐</a> </li>
<div class="line"></div>
<ul>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373801.shtml" target="_blank" title="世界第一的瘟疫之源:IMP老鼠疯狂偷人击杀集锦" class="list3_2">世界第一的瘟疫之源:IMP老鼠疯狂偷人击杀集锦</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/news/bgzt/" target="_blank" title="八卦杂谈" class="list3_1 green">新闻</a><a href="http://lol.tgbus.com/news/bgzt/373798.shtml" target="_blank" title="2015创联赛英雄联盟惊现挑衅战队:但求一败" class="list3_2">2015创联赛英雄联盟惊现挑衅战队:但求一败</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373797.shtml" target="_blank" title="Fnatic战队ADC Rekkles比赛集锦:S5我们来了" class="list3_2">Fnatic战队ADC Rekkles比赛集锦:S5我们来了</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373794.shtml" target="_blank" title="永猎双子千珏史诗级杀戮TOP10:变态到极致的伤害" class="list3_2">永猎双子千珏史诗级杀戮TOP10:变态到极致的伤害</a> </li>
<li><span class="fr">09-23</span>
<a href="http://lol.tgbus.com/sp/newsp/" target="_blank" title="视频(新)" class="list3_1 yellow">视频</a><a href="http://lol.tgbus.com/sp/newsp/373785.shtml" target="_blank" title="欢乐五加二第33期:AP豹女在丛林中的狩猎游戏" class="list3_2">欢乐五加二第33期:AP豹女在丛林中的狩猎游戏</a> </li>
</ul>
根据这段代码,我们提取几个标签,包括日期,类型,标题和链接地址,建一个新闻类 NewsInfo
public class NewsInfo {
private String title;
private String url;
private String date;
private String kind;
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
public String getdDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getKind() {
return kind;
}
public void setKind(String kind) {
this.kind = kind;
}
}
下面进入正题,提取出来这些标签的值,
<div class="newslist">