一、Hadoop 完全分布式环境配置
主要参考了以下两个链接,尝试配置了以 mac 作为一台 master, 两台 ubuntu 作为 slave,也尝试了一台 ubuntu 作为 master,两台 ubuntu 作为 slave
本机环境:
mac
parallels desktop 及其上3台 ubuntu 系统
配置有什么问题,可以相互讨论!
参考链接:http://blog.csdn.net/wk51920/article/details/51686038
http://www.w2bc.com/Article/19645
涉及命令:
javac -classpath /usr/hadoop/hadoop-2.8.0/share/hadoop/common/hadoop-common-2.8.0.jar:/usr/hadoop/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.0.jar WordCount.java -d classes
jar -cvf WordCount.jar *
hadoop fs -rm -r /output
hadoop jar WordCount.jar WordCount /input/count.txt /output
二、Word Count 程序运行
原始代码:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.util.StringTokenizer;
/**
* Created by zhuqiuhui on 2017/7/14.
*/
public class WordCount extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
if(args.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true)?0:1;
}
public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
IntWritable one = new IntWritable(1);
Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
int sum = 0;
for(IntWritable val:values) {
sum += val.get();
}
result.set(sum);
context.write(key,result);
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new WordCount(), args);
System.exit(exitCode);
}
}
输入文本:
hadoop mapreduce
hadoop yarn
hadoop hdfs
hadoop mapreduce
hadoop yarn
hadoop hdfs
zqh gkn
lzy zqh
报错:
17/07/20 04:22:35 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:8032
17/07/20 04:22:35 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/20 04:22:36 INFO input.FileInputFormat: Total input files to process : 1
17/07/20 04:22:36 INFO mapreduce.JobSubmitter: number of splits:1
17/07/20 04:22:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_0001
17/07/20 04:22:37 INFO impl.YarnClientImpl: Submitted application application_1500533815972_0001
17/07/20 04:22:37 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0001/
17/07/20 04:22:37 INFO mapreduce.Job: Running job: job_1500533815972_0001
17/07/20 04:22:46 INFO mapreduce.Job: Job job_1500533815972_0001 running in uber mode : false
17/07/20 04:22:46 INFO mapreduce.Job: map 0% reduce 0%
17/07/20 04:22:50 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
... 7 more
17/07/20 04:22:53 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
... 7 more
17/07/20 04:22:59 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
... 7 more
17/07/20 04:23:04 INFO mapreduce.Job: map 100% reduce 100%
17/07/20 04:23:04 INFO mapreduce.Job: Job job_1500533815972_0001 failed with state FAILED due to: Task failed task_1500533815972_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
17/07/20 04:23:04 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11149
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11149
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=11149
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=11416576
Total megabyte-milliseconds taken by all reduce tasks=0
原因:
执行mapreduce出现的错,原因是map类和reduce没有加static修饰,因为Hadoop在调用map和reduce类时采用的反射调用,内部类不是静态的,没有获取到内部类的实例。
改后代码:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.util.StringTokenizer;
/**
* Created by zhuqiuhui on 2017/7/14.
*/
public class WordCount extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
if(args.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true)?0:1;
}
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
IntWritable one = new IntWritable(1);
Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
int sum = 0;
for(IntWritable val:values) {
sum += val.get();
}
result.set(sum);
context.write(key,result);
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new WordCount(), args);
System.exit(exitCode);
}
}
输出正常:
17/07/20 04:39:15 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:8032
17/07/20 04:39:16 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/20 04:39:16 INFO input.FileInputFormat: Total input files to process : 1
17/07/20 04:39:16 INFO mapreduce.JobSubmitter: number of splits:1
17/07/20 04:39:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_0002
17/07/20 04:39:16 INFO impl.YarnClientImpl: Submitted application application_1500533815972_0002
17/07/20 04:39:16 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0002/
17/07/20 04:39:16 INFO mapreduce.Job: Running job: job_1500533815972_0002
17/07/20 04:39:22 INFO mapreduce.Job: Job job_1500533815972_0002 running in uber mode : false
17/07/20 04:39:22 INFO mapreduce.Job: map 0% reduce 0%
17/07/20 04:39:28 INFO mapreduce.Job: map 100% reduce 0%
17/07/20 04:39:34 INFO mapreduce.Job: map 100% reduce 100%
17/07/20 04:39:34 INFO mapreduce.Job: Job job_1500533815972_0002 completed successfully
17/07/20 04:39:34 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=87
FILE: Number of bytes written=272771
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=202
HDFS: Number of bytes written=53
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3192
Total time spent by all reduces in occupied slots (ms)=2943
Total time spent by all map tasks (ms)=3192
Total time spent by all reduce tasks (ms)=2943
Total vcore-milliseconds taken by all map tasks=3192
Total vcore-milliseconds taken by all reduce tasks=2943
Total megabyte-milliseconds taken by all map tasks=3268608
Total megabyte-milliseconds taken by all reduce tasks=3013632
Map-Reduce Framework
Map input records=8
Map output records=16
Map output bytes=162
Map output materialized bytes=87
Input split bytes=99
Combine input records=16
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=87
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=169
CPU time spent (ms)=1120
Physical memory (bytes) snapshot=298622976
Virtual memory (bytes) snapshot=3772493824
Total committed heap usage (bytes)=140972032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=103
File Output Format Counters
Bytes Written=53
文本输出 part-r-00000:
gkn 1
hadoop 6
hdfs 2
lzy 1
mapreduce 2
yarn 2
zqh 2