(1)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
解决方法:在cygwin中输入:export LANG="zh_CN.GBK"
,而后回车
其实就是设置下linux的环境变量
(2)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
:439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
问题解决:
此多为crawl-urlfilter.txt:MY.DOMAIN.NAME的修改不正确
(3)eclipse中的nutch Job Failed
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
问题解决:
此问题是eclipse的java版本设置问题,解决方法:
如原来使用java1.4,需要改为1.6
project-》properties-》java compiler
右 jdk compliance
compiler compliance level:改为6.0