用正则表达式获取网页中所有超链接,如有问题请各位大虾指正
正则表达式:<(a|A)//s*[a-zA-Z](.*?)>(.*?)<///(a|A)>
例子:
import java.net.URL;
import java.io.*;
import java.util.regex.*;public class TestURL {
public static void main(String[] args) throws Exception {
// 搜索页面的超链接,注意解决正则表达式中的贪婪匹配问题。
URL url = new URL("http://www.it315.org");
InputStream ips = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(ips));
String contentAll = "";
String line = "";
while ((line = br.readLine()) != null) {
contentAll += line;
}
br.close();
// 此处(.*?),"?" 是解决贪婪匹配原则
Pattern p = Pattern.compile("<(a|A)//s*[a-zA-Z](.*?)>(.*?)<///(a|A)>");
Matcher m = p.matcher(contentAll);while (m.find()) {
System.out.println(m.group());
}
}
}传智播客:2007/04/01