Hadoop中简单序列化（实现Writable）

最新推荐文章于 2022-09-29 19:10:53 发布

学习中....

最新推荐文章于 2022-09-29 19:10:53 发布

阅读量411

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/qq_36055407/article/details/95307027

版权

Hadoop 专栏收录该内容

31 篇文章 0 订阅

订阅专栏

统计数据

1.需要实现的方法有:

write(DataOutput out) readfields(DataInput in)

Word.java

package test;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;

public class Word  implements Writable{
	private String name;
	private int num;
	private int count;

	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public int getNum() {
		return num;
	}
	public void setNum(int num) {
		this.num = num;
	}
	public int getCount() {
		return count;
	}
	public void setCount(int count) {
		this.count = count;
	}
	public void write(DataOutput out) throws IOException {
		out.writeUTF(name);
		out.writeInt(num);
		out.writeInt(count);
	}
	public void readFields(DataInput in) throws IOException {
		name=in.readUTF();
		num=in.readInt();
		count=in.readInt();
	}
	@Override
	public String toString() {
		return name+"  "+count;
	}

}

2.Main类（Mapper,Reducer）

package test;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class Main {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Job job=Job.getInstance(new Configuration());
		job.setJarByClass(Main.class);
		job.setMapperClass(wordMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Word.class);
		job.setReducerClass(WordReduce.class);
		job.setOutputKeyClass(Word.class);
		job.setOutputValueClass(NullWritable.class);
		FileInputFormat.addInputPaths(job, args[0]);
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		System.exit(job.waitForCompletion(true)?0:1);
	}
	public static class wordMapper extends Mapper<LongWritable, Text, Text, Word>{

		@Override
		public void map(LongWritable key, Text text, Mapper<LongWritable, Text, Text, Word>.Context context)
				throws IOException, InterruptedException {
			String str=text.toString();
			String[] arr=str.split("\\s");
			Word word=new Word();
			word.setName(arr[0]);
			word.setNum(Integer.valueOf(arr[1]));
			context.write(new Text(arr[0]), word);
		}

		
	}
	public static class WordReduce extends Reducer<Text, Word, Word, NullWritable>{

		@Override
		public  void reduce(Text arg0, Iterable<Word> words, Reducer<Text, Word, Word, NullWritable>.Context content)
				throws IOException, InterruptedException {
			int sum=0;
			String name=null;
			for(Word word:words){
				name=word.getName();
				sum+=word.getNum();
			}
			Word word=new Word();
			word.setName(name);
			word.setCount(sum);
			content.write(word, NullWritable.get());
		}
		
	}
}

3.上传到集群中

4.执行命令

hadoop jar wc /word /out01

5.输出以下则成功：

[root@wpy apps]# hadoop jar wc.jar /word.txt /hh03
19/07/10 17:31:05 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/07/10 17:31:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/07/10 17:31:06 INFO input.FileInputFormat: Total input paths to process : 1
19/07/10 17:31:06 INFO mapreduce.JobSubmitter: number of splits:1
19/07/10 17:31:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562749459938_0003
19/07/10 17:31:07 INFO impl.YarnClientImpl: Submitted application application_1562749459938_0003
19/07/10 17:31:07 INFO mapreduce.Job: The url to track the job: http://wpy:8088/proxy/application_1562749459938_0003/
19/07/10 17:31:07 INFO mapreduce.Job: Running job: job_1562749459938_0003
19/07/10 17:31:15 INFO mapreduce.Job: Job job_1562749459938_0003 running in uber mode : false
19/07/10 17:31:15 INFO mapreduce.Job:  map 0% reduce 0%
19/07/10 17:31:22 INFO mapreduce.Job:  map 100% reduce 0%
19/07/10 17:31:31 INFO mapreduce.Job:  map 100% reduce 100%
19/07/10 17:31:32 INFO mapreduce.Job: Job job_1562749459938_0003 completed successfully
19/07/10 17:31:32 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=175
		FILE: Number of bytes written=213491
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=165
		HDFS: Number of bytes written=41
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=4808
		Total time spent by all reduces in occupied slots (ms)=7508
		Total time spent by all map tasks (ms)=4808
		Total time spent by all reduce tasks (ms)=7508
		Total vcore-milliseconds taken by all map tasks=4808
		Total vcore-milliseconds taken by all reduce tasks=7508
		Total megabyte-milliseconds taken by all map tasks=4923392
		Total megabyte-milliseconds taken by all reduce tasks=7688192
	Map-Reduce Framework
		Map input records=7
		Map output records=7
		Map output bytes=155
		Map output materialized bytes=175
		Input split bytes=92
		Combine input records=0
		Combine output records=0
		Reduce input groups=4
		Reduce shuffle bytes=175
		Reduce input records=7
		Reduce output records=4
		Spilled Records=14
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=120
		CPU time spent (ms)=1010
		Physical memory (bytes) snapshot=323358720
		Virtual memory (bytes) snapshot=1685254144
		Total committed heap usage (bytes)=136056832
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=73
	File Output Format Counters 
		Bytes Written=41

6.查看结果：

命令：hdfs dfs -cat /outo1/*

7.结果如下：

学习中....

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Hadoop中简单序列化（实现Writable）

统计数据1.需要实现的方法有: write(DataOutput out) readfields(DataInput in)Word.javapackage test;import java.io.DataInput;import java.io.DataOutput;import java.io.IOE...
复制链接

扫一扫

专栏目录