坑爹呀,这个错误,纠结了我好几天,终于解决了,我觉的很有必要单独拿出来与大家分享下:
原因就是这个是Nutch1.3在与Hadoop0.20.203.0整合的时候的一个bug,在官网上有做出了相应的修改:
修改方法就是得修改两个文件:加号表示添加,减号表示删除~~
修改的第一个文件是:src/java/org/apache/nutch/parse/ParseOutputFormat.java
public void checkOutputSpecs(FileSystem fs, JobConf job) throws IOException {
- Path out = FileOutputFormat.getOutputPath(job);
- if (fs.exists(new Path(out, CrawlDatum.PARSE_DIR_NAME)))
- throw new IOException("Segment already parsed!");
+ Path out = FileOutputFormat.getOutputPath(job);
+ if ((out == null) && (job.getNumReduceTasks() != 0)) {
+ throw new InvalidJobConfException(
+ "Output directory not set in JobConf.");
+ }
+ if (fs == null) {
+ fs = out.getFileSystem(job);
+ }
+ if (fs.exists(new Path(out, CrawlDatum.PARSE_DIR_NAME)))
+ throw new IOException("Segment already parsed!");
}
修改的第二个文件是:src/java/org/apache/nutch/fetcher/FetcherOutputFormat.java
import org.apache.hadoop.io.SequenceFile.CompressionType;
import org.apache.hadoop.mapred.FileOutputFormat;
+import org.apache.hadoop.mapred.InvalidJobConfException;
import org.apache.hadoop.mapred.OutputFormat;
import org.apache.hadoop.mapred.RecordWriter;
import org.apache.hadoop.mapred.JobConf;
@@ -46,8 +47,15 @@
public void checkOutputSpecs(FileSystem fs, JobConf job) throws IOException {
Path out = FileOutputFormat.getOutputPath(job);
+ if ((out == null) && (job.getNumReduceTasks() != 0)) {
+ throw new InvalidJobConfException(
+ "Output directory not set in JobConf.");
+ }
+ if (fs == null) {
+ fs = out.getFileSystem(job);
+ }
if (fs.exists(new Path(out, CrawlDatum.FETCH_DIR_NAME)))
- throw new IOException("Segment already fetched!");
+ throw new IOException("Segment already fetched!");
}
修改完这两个文件,再重新ant编译下,问题解决~~