问题描述如下:
1.环境
虚拟机: VMware station 10
OS: CentOS 6.4
eclipse : ------不记得了
JDK : 1.7.06
hadoop: 1.0.4
2.代码:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, Text>{
private Text word = new Text("line:");
public void map(Text key, Text value, Context context
) throws IOException, InterruptedException {
context.write(word,value);
}
}
public static class IntSumReducer
extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
Text result = new Text();
String add= new String();
for (Text val : values) {
add.concat(val.toString());
}
result.set(add);
context.write(key,result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
3 错误提示如下:
12/08/27 15:49:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/08/27 15:49:40 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/08/27 15:49:41 INFO input.FileInputFormat: Total input paths to process : 4
12/08/27 15:49:41 INFO mapred.JobClient: Running job: job_local_0001
12/08/27 15:49:41 INFO util.ProcessTree: setsid exited with exit code 0
12/08/27 15:49:41 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3249256e
12/08/27 15:49:41 INFO mapred.MapTask: io.sort.mb = 100
12/08/27 15:49:41 INFO mapred.MapTask: data buffer = 79691776/99614720
12/08/27 15:49:41 INFO mapred.MapTask: record buffer = 262144/327680
12/08/27 15:49:41 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at SmallFilesToSequenceFileConverter$SequenceFileMapper.map(SmallFilesToSequenceFileConverter.java:38)
at SmallFilesToSequenceFileConverter$SequenceFileMapper.map(SmallFilesToSequenceFileConverter.java:1)
3.解决思路。
网上有两种解决思路
(1)首先你看一下你map的输出和reduce的输入是不是对应的,然后看看你的map和reduce里的参数和下面的是不是设置的一样(来自:点击打开链接)
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
这部分,我有试过不过没有效果.
(2)http://www.360doc.com/content/11/0524/16/7000788_119067361.shtml 这篇文章分析深入,但是解决思路太麻烦,而且我也没弄懂实际该怎么操作。很有可能,我的错误和这里提到的错误不是同一个问题。此处只做引用吧~
我的方法:
网上有人说,因为hadoop 版本不一致,mapreduce里面的map 和reduce方法需要重载,于是我按照他们的说法,载map方法和reduce方法前面加了一个 @Override ,这时eclipse 提示错误
the method map(Text,Text,Mapper<Object,Text,text,Text>.Context) of type SortAndUpper.SpliMapper must overrdide or implement a supertype
表明,我所写的map方法有误,仔细一看才知道我的map方法里面的参数 第一个参数为Text ,查看hadoop API发现,map方法里没有全部都是Text 类型的参数序列。第一个参数修改为Object 就可以了。