Hadoop完全分布式环境配置及 Word Count 程序运行

最新推荐文章于 2021-01-17 17:56:25 发布

bboyzqh

最新推荐文章于 2021-01-17 17:56:25 发布

阅读量1k

点赞数

文章标签：分布式 hadoop word ubuntu

本文链接：https://blog.csdn.net/zhuqiuhui/article/details/75576913

版权

一、Hadoop 完全分布式环境配置

主要参考了以下两个链接，尝试配置了以 mac 作为一台 master, 两台 ubuntu 作为 slave，也尝试了一台 ubuntu 作为 master，两台 ubuntu 作为 slave

本机环境：

mac

parallels desktop 及其上3台 ubuntu 系统

配置有什么问题，可以相互讨论！

参考链接：http://blog.csdn.net/wk51920/article/details/51686038

http://www.w2bc.com/Article/19645

涉及命令：

javac -classpath /usr/hadoop/hadoop-2.8.0/share/hadoop/common/hadoop-common-2.8.0.jar:/usr/hadoop/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.0.jar WordCount.java -d classes

jar -cvf WordCount.jar *

hadoop fs -rm -r /output

hadoop jar WordCount.jar WordCount /input/count.txt /output

二、Word Count 程序运行

原始代码：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;
import java.util.StringTokenizer;

/**
 * Created by zhuqiuhui on 2017/7/14.
 */
public class WordCount extends Configured implements Tool {

    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        if(args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        Job job = new Job(conf, "wordcount");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        return job.waitForCompletion(true)?0:1;
    }

    public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        IntWritable one = new IntWritable(1);
        Text word = new Text();

        public void map(Object key, Text value, Context context) throws IOException,InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while(itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
            int sum = 0;
            for(IntWritable val:values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key,result);
        }
    }

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new WordCount(), args);
        System.exit(exitCode);
    }
}

输入文本：

hadoop mapreduce   
hadoop yarn
hadoop hdfs
hadoop mapreduce   
hadoop yarn
hadoop hdfs
zqh gkn
lzy zqh

报错：

17/07/20 04:22:35 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:8032
17/07/20 04:22:35 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/20 04:22:36 INFO input.FileInputFormat: Total input files to process : 1
17/07/20 04:22:36 INFO mapreduce.JobSubmitter: number of splits:1
17/07/20 04:22:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_0001
17/07/20 04:22:37 INFO impl.YarnClientImpl: Submitted application application_1500533815972_0001
17/07/20 04:22:37 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0001/
17/07/20 04:22:37 INFO mapreduce.Job: Running job: job_1500533815972_0001
17/07/20 04:22:46 INFO mapreduce.Job: Job job_1500533815972_0001 running in uber mode : false
17/07/20 04:22:46 INFO mapreduce.Job:  map 0% reduce 0%
17/07/20 04:22:50 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
	at java.lang.Class.getConstructor0(Class.java:3082)
	at java.lang.Class.getDeclaredConstructor(Class.java:2178)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
	... 7 more

17/07/20 04:22:53 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
	at java.lang.Class.getConstructor0(Class.java:3082)
	at java.lang.Class.getDeclaredConstructor(Class.java:2178)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
	... 7 more

17/07/20 04:22:59 INFO mapreduce.Job: Task Id : attempt_1500533815972_0001_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.NoSuchMethodException: WordCount$TokenizerMapper.<init>()
	at java.lang.Class.getConstructor0(Class.java:3082)
	at java.lang.Class.getDeclaredConstructor(Class.java:2178)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
	... 7 more

17/07/20 04:23:04 INFO mapreduce.Job:  map 100% reduce 100%
17/07/20 04:23:04 INFO mapreduce.Job: Job job_1500533815972_0001 failed with state FAILED due to: Task failed task_1500533815972_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

17/07/20 04:23:04 INFO mapreduce.Job: Counters: 13
	Job Counters 
		Failed map tasks=4
		Killed reduce tasks=1
		Launched map tasks=4
		Other local map tasks=3
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=11149
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=11149
		Total time spent by all reduce tasks (ms)=0
		Total vcore-milliseconds taken by all map tasks=11149
		Total vcore-milliseconds taken by all reduce tasks=0
		Total megabyte-milliseconds taken by all map tasks=11416576
		Total megabyte-milliseconds taken by all reduce tasks=0

原因：

执行mapreduce出现的错，原因是map类和reduce没有加static修饰，因为Hadoop在调用map和reduce类时采用的反射调用，内部类不是静态的，没有获取到内部类的实例。

改后代码：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;
import java.util.StringTokenizer;

/**
 * Created by zhuqiuhui on 2017/7/14.
 */
public class WordCount extends Configured implements Tool {

    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        if(args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        Job job = new Job(conf, "wordcount");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        return job.waitForCompletion(true)?0:1;
    }

    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        IntWritable one = new IntWritable(1);
        Text word = new Text();

        public void map(Object key, Text value, Context context) throws IOException,InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while(itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
            int sum = 0;
            for(IntWritable val:values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key,result);
        }
    }

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new WordCount(), args);
        System.exit(exitCode);
    }
}

输出正常：

17/07/20 04:39:15 INFO client.RMProxy: Connecting to ResourceManager at master/10.211.55.5:8032
17/07/20 04:39:16 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/20 04:39:16 INFO input.FileInputFormat: Total input files to process : 1
17/07/20 04:39:16 INFO mapreduce.JobSubmitter: number of splits:1
17/07/20 04:39:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500533815972_0002
17/07/20 04:39:16 INFO impl.YarnClientImpl: Submitted application application_1500533815972_0002
17/07/20 04:39:16 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500533815972_0002/
17/07/20 04:39:16 INFO mapreduce.Job: Running job: job_1500533815972_0002
17/07/20 04:39:22 INFO mapreduce.Job: Job job_1500533815972_0002 running in uber mode : false
17/07/20 04:39:22 INFO mapreduce.Job:  map 0% reduce 0%
17/07/20 04:39:28 INFO mapreduce.Job:  map 100% reduce 0%
17/07/20 04:39:34 INFO mapreduce.Job:  map 100% reduce 100%
17/07/20 04:39:34 INFO mapreduce.Job: Job job_1500533815972_0002 completed successfully
17/07/20 04:39:34 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=87
		FILE: Number of bytes written=272771
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=202
		HDFS: Number of bytes written=53
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3192
		Total time spent by all reduces in occupied slots (ms)=2943
		Total time spent by all map tasks (ms)=3192
		Total time spent by all reduce tasks (ms)=2943
		Total vcore-milliseconds taken by all map tasks=3192
		Total vcore-milliseconds taken by all reduce tasks=2943
		Total megabyte-milliseconds taken by all map tasks=3268608
		Total megabyte-milliseconds taken by all reduce tasks=3013632
	Map-Reduce Framework
		Map input records=8
		Map output records=16
		Map output bytes=162
		Map output materialized bytes=87
		Input split bytes=99
		Combine input records=16
		Combine output records=7
		Reduce input groups=7
		Reduce shuffle bytes=87
		Reduce input records=7
		Reduce output records=7
		Spilled Records=14
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=169
		CPU time spent (ms)=1120
		Physical memory (bytes) snapshot=298622976
		Virtual memory (bytes) snapshot=3772493824
		Total committed heap usage (bytes)=140972032
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=103
	File Output Format Counters 
		Bytes Written=53

文本输出 part-r-00000：

gkn     1
hadoop  6
hdfs    2
lzy     1
mapreduce       2
yarn    2
zqh     2

bboyzqh

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫