Rome 是一个很好的,多平台的解析xml的工具,亲测可以解析Wordpress,网易新闻等
测试的速度不错,主要是取决于HTTP的速度,HTTP抓取取决于带宽和网站的php生成能力,大小为100ms~6000ms,解析xml在 i3-3217u ,OSX 10.9 下是600~800ms
安装外部lib很简单,只要一个rometools,一个jdom 就可以.以上这些不做讨论
现在说下数据对应结构:
Syndfeed feed ====>对应 channel
List<SyndEntry> entries = feed.getEntries() ====> 对应每个item,用list 封装的
下面上代码:
public class Main {
public static void main(String[] args) {
String weburl = "http://news.163.com/special/00011K6L/rss_gn.xml";
File file = new File("/Users/leon/Desktop/2.txt");
try {
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed_loacl_cache = input.build(file);
printData(feed_loacl_cache);
URL url = new URL(weburl);
XmlReader reader = new XmlReader(url);
SyndFeed feed_http = input.build(reader);
if(feed_http.getPublishedDate().equals(feed_loacl_cache.getPublishedDate())){
System.out.println("No update,Last time:" + feed_http.getPublishedDate());
}else {
printData(feed_http);
SyndFeedOutput output = new SyndFeedOutput();
output.output(feed_http,file);
}
}catch (Exception e){
e.printStackTrace();
}
}
static void printData (SyndFeed feed){
List<SyndEntry> entries = feed.getEntries();
System.out.println("update success:" + feed.getPublishedDate());
System.out.println("Title is :"+ entries.get(0).getTitle());
System.out.println("Des is :"+ entries.get(0).getDescription().getValue());
System.out.println("Link is :"+ entries.get(0).getLink());
}
}