MapReduce 初体验

最新推荐文章于 2024-07-11 16:53:39 发布

山.那.边

最新推荐文章于 2024-07-11 16:53:39 发布

阅读量286

点赞数 3

文章标签： mapreduce 大数据

本文链接：https://blog.csdn.net/qq_29312605/article/details/135893822

版权

Docker 搭建 Hadoop集群-CSDN博客

目录创建

打开HDFS控制台

http://127.0.0.1:50070/

新建名称 test 的文件夹

进入到hadoop目录执行 hadoop-xxx/bin/hdfs dfs -chmod -R 777 / 给所有文件赋值最高权限（所有node节点）

创建成功

继续创建如下目录

准备数据

新建txt文件并上传

hadoop is very good

mapreduce is very good

可以看出页面代码识别的主机名来进行api调用，

在宿主机上添加如下映射

vi /etc/hosts

127.0.0.1 hadoop01
127.0.0.1 hadoop02

文件上传成功

运行MapReduce用WordCount进行处理

hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount /test/workcount/input /test/workcount/output

报错：

退出安全模式

hdfs dfsadmin -safemode leave

再次运行

2024-01-27 13:27:20,952 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop01/172.18.0.2:8032
2024-01-27 13:27:21,205 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1706361976462_0002
2024-01-27 13:27:22,049 INFO input.FileInputFormat: Total input files to process : 2
2024-01-27 13:27:22,091 INFO mapreduce.JobSubmitter: number of splits:2
2024-01-27 13:27:22,196 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1706361976462_0002
2024-01-27 13:27:22,196 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-01-27 13:27:22,284 INFO conf.Configuration: resource-types.xml not found
2024-01-27 13:27:22,284 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-01-27 13:27:22,465 INFO impl.YarnClientImpl: Submitted application application_1706361976462_0002
2024-01-27 13:27:22,486 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1706361976462_0002/
2024-01-27 13:27:22,486 INFO mapreduce.Job: Running job: job_1706361976462_0002
2024-01-27 13:27:27,534 INFO mapreduce.Job: Job job_1706361976462_0002 running in uber mode : false
2024-01-27 13:27:27,536 INFO mapreduce.Job:  map 0% reduce 0%
2024-01-27 13:27:32,659 INFO mapreduce.Job:  map 100% reduce 0%
2024-01-27 13:27:36,691 INFO mapreduce.Job:  map 100% reduce 100%
2024-01-27 13:27:36,702 INFO mapreduce.Job: Job job_1706361976462_0002 completed successfully
2024-01-27 13:27:36,770 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=97
		FILE: Number of bytes written=828676
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=274
		HDFS: Number of bytes written=40
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
		HDFS: Number of bytes read erasure-coded=0
	Job Counters
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=4960
		Total time spent by all reduces in occupied slots (ms)=1279
		Total time spent by all map tasks (ms)=4960
		Total time spent by all reduce tasks (ms)=1279
		Total vcore-milliseconds taken by all map tasks=4960
		Total vcore-milliseconds taken by all reduce tasks=1279
		Total megabyte-milliseconds taken by all map tasks=5079040
		Total megabyte-milliseconds taken by all reduce tasks=1309696
	Map-Reduce Framework
		Map input records=2
		Map output records=8
		Map output bytes=75
		Map output materialized bytes=103
		Input split bytes=232
		Combine input records=8
		Combine output records=8
		Reduce input groups=5
		Reduce shuffle bytes=103
		Reduce input records=8
		Reduce output records=5
		Spilled Records=16
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=704
		CPU time spent (ms)=2760
		Physical memory (bytes) snapshot=1225478144
		Virtual memory (bytes) snapshot=7589892096
		Total committed heap usage (bytes)=1238892544
		Peak Map Physical memory (bytes)=490315776
		Peak Map Virtual memory (bytes)=2526896128
		Peak Reduce Physical memory (bytes)=251367424
		Peak Reduce Virtual memory (bytes)=2537238528
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=42
	File Output Format Counters
		Bytes Written=40

hdfs dfs -cat /test/workcount/output/part-r-00000

wordcount 源码

hadoop-mapreduce-examples-3.3.6.jar wordcount 源码

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    public static class TokenizerMapper 
    extends Mapper<Object, Text, Text, IntWritable>{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
                       ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer 
    extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, 
                           Context context
                          ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: wordcount <in> [<in>...] <out>");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

山.那.边

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
MapReduce 初体验

进入到hadoop目录执行 hadoop-xxx/bin/hdfs dfs -chmod -R 777 / 给所有文件赋值最高权限（所有node节点）hadoop-mapreduce-examples-3.3.6.jar wordcount 源码。运行MapReduce用WordCount进行处理。可以看出页面代码识别的主机名来进行api调用，新建名称 test 的文件夹。在宿主机上添加如下映射。新建txt文件并上传。
复制链接

扫一扫