Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
出现的问题原因可能是以下:
(1)一般为crawl-urlfilters.txt中配置问题,比如过滤条件应为
+^http://www.ihooyo.com ,而配置成了 http://www.ihooyo.com 这样的情况就引起如上错误。
(2)此问题是eclipse的java版本设置问题,解决方法:
如原来使用java1.4,需要改为1.6
project-》properties-》java compiler
右 jdk compliance
compiler compliance level:改为6.0
2 运行crawl报错Job failed
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
问题解决:
此多为crawl-urlfilter.txt:MY.DOMAIN.NAME的修改不正确
3 又一个Job failed
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
问题解决:
1、多为crawl-urlfilter.txt的MY.DOMAIN.NAME修改不正确2、中断过正在抓取的程序
3、刚集合了庖丁分词