hadoop 2.7.4 下运行WordCount例子笔记

最新推荐文章于 2024-06-07 19:41:02 发布

lhever_

最新推荐文章于 2024-06-07 19:41:02 发布

阅读量1.2k

点赞数

分类专栏： BigData 文章标签： hadoop WordCount

本文链接：https://blog.csdn.net/nmgrd/article/details/79032073

版权

BigData 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

1.源码如下：

package com.mapred.core;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.junit.Test;

import java.io.IOException;

public class WordCount {

   @Test
	public static void main(String[] args) throws Exception {
	   Configuration conf = new Configuration();
		//FileSystem fs = FileSystem.get(new URI("hdfs://192.168.70.128:9000"),conf); //理解为一个访问端到服务端的连接
       Job job = new Job(conf);
       //指明程序的入口
       job.setJarByClass(WordCount.class);
       
       //指明输入的数据
       FileInputFormat.setInputPaths(job, new Path(args[0]));
       //组织mapper和reducer
       //设置mapper
       job.setMapperClass(WordCountMapper.class);
       job.setMapOutputKeyClass(Text.class);
       job.setMapOutputValueClass(LongWritable.class);
       
       //设置reducer
       job.setReducerClass(WordCountReducer.class);
       job.setOutputKeyClass(Text.class);
       job.setOutputValueClass(LongWritable.class);
       
       //指明数据输出的路径
       FileOutputFormat.setOutputPath(job, new Path(args[1]));
       //提交任务运行
       job.waitForCompletion(true);
	}
}


class WordCountMapper extends Mapper<LongWritable,Text,Text,LongWritable>{

	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
        String val = value.toString();
        String[] words = val.split(" ");
        for(String word : words){
        	context.write(new Text(word), new LongWritable(1));
        }
	}
	
}

class WordCountReducer extends Reducer<Text,LongWritable,Text,LongWritable>{
	@Override
	protected void reduce(Text key, Iterable<LongWritable> values,Context context)
			throws IOException, InterruptedException {
		long sum = 0;
		for(LongWritable value : values){
			sum += value.get();
		}
		context.write(key, new LongWritable(sum));
	}
	
}

2.运行步骤笔记：

             运行WordCount步骤：
1.将项目打成jar包，比如打成mapredProject.jar包。

2.上传mapredProject.jar到/soft目录

3.在/soft目录创建输入数据文件input.txt。查看input.txt的文件内容, more /soft/input.txt:
lengend
i
am
a
hero
i
am
a
fool
i
am
a
apple
but
you
are
a
bastard

4.在hdfs中创建输入数据存放目录：  hadoop fs -mkdir -p  /wordcount/in

5.将/soft/input.txt上传到hdfs文件系统的wordcount目录下： hadoop fs -put /soft/input.txt /wordcount/in

6.运行WordCount项目： hadoop jar /soft/mapredProject.jar  /wocount/in /wocount/output

7.等待完成，控制台会展示进度，作为示例，某次运行进度可能如下所示：
[hadoop@node1 soft]$ hadoop jar /soft/mapredProject.jar  /wocount/in /wocount/output
18/01/10 21:22:13 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.209.129:8032
18/01/10 21:22:14 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/01/10 21:22:17 INFO input.FileInputFormat: Total input paths to process : 1
18/01/10 21:22:17 INFO mapreduce.JobSubmitter: number of splits:1
18/01/10 21:22:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515211219380_0005
18/01/10 21:22:21 INFO impl.YarnClientImpl: Submitted application application_1515211219380_0005
18/01/10 21:22:21 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1515211219380_0005/
18/01/10 21:22:21 INFO mapreduce.Job: Running job: job_1515211219380_0005
18/01/10 21:23:18 INFO mapreduce.Job: Job job_1515211219380_0005 running in uber mode : false
18/01/10 21:23:18 INFO mapreduce.Job:  map 0% reduce 0%
18/01/10 21:23:51 INFO mapreduce.Job:  map 100% reduce 0%
18/01/10 21:24:10 INFO mapreduce.Job:  map 100% reduce 100%
18/01/10 21:24:12 INFO mapreduce.Job: Job job_1515211219380_0005 completed successfully
18/01/10 21:24:13 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=264
		FILE: Number of bytes written=242009
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=171
		HDFS: Number of bytes written=76
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=30472
		Total time spent by all reduces in occupied slots (ms)=15714
		Total time spent by all map tasks (ms)=30472
		Total time spent by all reduce tasks (ms)=15714
		Total vcore-milliseconds taken by all map tasks=30472
		Total vcore-milliseconds taken by all reduce tasks=15714
		Total megabyte-milliseconds taken by all map tasks=31203328
		Total megabyte-milliseconds taken by all reduce tasks=16091136
	Map-Reduce Framework
		Map input records=19
		Map output records=19
		Map output bytes=220
		Map output materialized bytes=264
		Input split bytes=103
		Combine input records=0
		Combine output records=0
		Reduce input groups=12
		Reduce shuffle bytes=264
		Reduce input records=19
		Reduce output records=12
		Spilled Records=38
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=249
		CPU time spent (ms)=2860
		Physical memory (bytes) snapshot=288591872
		Virtual memory (bytes) snapshot=4164571136
		Total committed heap usage (bytes)=141230080
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=68
	File Output Format Counters 
		Bytes Written=76


8.查看输出： hadoop fs -ls  /wocount/output，输出结果可能如下所示：
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2018-01-10 21:24 /wocount/output/_SUCCESS
-rw-r--r--   2 hadoop supergroup         76 2018-01-10 21:24 /wocount/output/part-r-00000


9.打开上述步骤列出的输入文件： hadoop fs -cat /wocount/output/part-r-00000 ， 可以看到统计的单词以及对应的数目如下所示：

	1
a	4
am	3
apple	1
are	1
bastard	1
but	1
fool	1
hero	1
i	3
lengend	1
you	1

lhever_

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
hadoop 2.7.4 下运行WordCount例子笔记

1.源码如下：package com.mapred.core;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import o
复制链接

扫一扫

专栏目录