写个小程序将新浪读书频道一网打尽

最新推荐文章于 2024-02-18 16:56:50 发布

luckystar2008

最新推荐文章于 2024-02-18 16:56:50 发布

阅读量497

点赞数

文章标签：读书正则表达式 netbeans templates url string

Core Java 专栏收录该内容

155 篇文章 0 订阅

订阅专栏

原文地址：http://www.blogjava.net/youxia/archive/2008/11/07/239310.html

各位朋友，等人等车等吃饭的时候可以干些什么呢？掏出手机看电子书是不错的选择。昨天，我写了一个小程序，基本上可以把新浪读书频道排行榜一网打尽。
程序只用到了Java中的这样一些知识：
1、URL类，用来连接新浪网
2、BufferedReader类，用来读取数据
3、Pattern类和Matcher类，使用正则表达式来提取小说的正文
完整的代码如下：
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package ebookdownloaderforsinanzt;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
*
* @author 海边沫沫
*/
public class Main {
 /**
 * @param args the command line arguments
 */
 public static void main(String[] args) {
 int upbound = Integer.parseInt(args[1]);
 for(int i = 1; i<=upbound ; i++){
 System.out.println(getParagraph("http://book.sina.com.cn/nzt/lit/"+args[0]+"/",i));
 System.out.println();
 }
 }
 private static String getParagraph(String url,int index) {
 int status = 0;
 String paragraph = "";
 try {
 URL ebook = new URL(url + index + ".shtml");
 BufferedReader reader = new BufferedReader(new InputStreamReader(ebook.openStream()));
 String line;
 while ((line = reader.readLine()) != null) {
 if (status == 0) {
 //还没有碰到标题
 Pattern pattern = Pattern.compile("(.*)<tr><td class=title14 align=center>(.*)</td></tr>(.*)");
 Matcher matcher = pattern.matcher(line);
 if (matcher.matches()) {
 paragraph += matcher.group(2);
 paragraph += "\n\n";
 status = 1;
 }
 }
 if (status == 1) {
 //还没有碰到文章的开头
 Pattern pattern = Pattern.compile("(.*)(.*)(.*)");
 Matcher matcher = pattern.matcher(line);
 if (matcher.matches()) {
 paragraph += matcher.group(2);
 status = 2; //碰到了正文中的画中画
 }
 }
 if (status == 2) {
 Pattern pattern = Pattern.compile("(.*)(.*)");
 Matcher matcher = pattern.matcher(line);
 if (matcher.matches()) {
 paragraph += matcher.group(2);
 status = 3;
 }
 }
 }
 //替换掉
 return paragraph.replaceAll("", "\n\n");
 } catch (Exception e) {
 System.out.println(e.toString());
 return null;
 }
 }
}
让大家看看截图：
新浪读书频道排行榜：

我写的小程序的运行画面：

下载下来的成果：

最后让大家看看我的IDE，我用上了最新版的NetBeans，还把它的主题改成了苹果样子：

最后要说的是，新浪读书频道上的书，根据URL不同，其源代码的结构也不同，所以要用不同的正则表达式来提取。上面的程序只能提取http://book.sina.com.cn/nzt/lit/小说名/序号.shtml这样的电子书。但是对程序做一点修改是很简单的。

luckystar2008

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
写个小程序将新浪读书频道一网打尽

原文地址：http://www.blogjava.net/youxia/archive/2008/11/07/239310.html各位朋友，等人等车等吃饭的时候可以干些什么呢？掏出手机看电子书是不错的选择。昨天，我写了一个小程序，基本上可以把新浪读书频道排行榜一网打尽。
复制链接

扫一扫