1、分布式抓取命令:
hadoop jar apache-nutch-1.6.job org.apache.nutch.crawl.Crawl url.txt -dir gonghui002 -threads 10 -depth 3 -topN 10
apache-nutch-1.6/runtime/deploy# bin/nutch crawl url.txt -dir gonghui -threads 20 -depth 3 -topN 2
1、分布式抓取命令:
hadoop jar apache-nutch-1.6.job org.apache.nutch.crawl.Crawl url.txt -dir gonghui002 -threads 10 -depth 3 -topN 10
apache-nutch-1.6/runtime/deploy# bin/nutch crawl url.txt -dir gonghui -threads 20 -depth 3 -topN 2