使用crawler4j非常简单,源码中已经包涵了很多的例子,并且是直接就可以运行的。
首先运行pom.xml,或者是直接下载依赖库:
https://code.google.com/p/crawler4j/downloads/detail?name=crawler4j-3.5-dependencies.zip&can=2&q=
1. 运行Example
打开edu.uci.ics.crawler4j.examples.basic下的BasicCrawlController,就一个main方法,注释掉前3行,改下目录和线程数:
/*if (args.length != 2) {
System.out.println("Needed parameters: ");
System.out.println("\t rootFolder (it will contain intermediate crawl data)");
System.out.println("\t numberOfCralwers (number of concurrent threads)");
return;
}*/
/* * 爬取时数据临时存放目录. */
String crawlStorageFolder = &