MapReduce简单实现（补）

最新推荐文章于 2022-06-29 22:19:00 发布

少羽baby

最新推荐文章于 2022-06-29 22:19:00 发布

阅读量229

点赞数

分类专栏： hadoop database 文章标签： mapreduce hadoop

本文链接：https://blog.csdn.net/qq_34233510/article/details/89329875

版权

database 同时被 2 个专栏收录

13 篇文章 2 订阅

订阅专栏

hadoop

4 篇文章 0 订阅

订阅专栏

目标是计算数据中手机号的上行总流量、下行总流量以及总流量。

1.数据

字段如下图- 要用到的是1，8，9三个字段。

2.jar包

1）可选择Maven，比较方便：
在pom.xml 文件中添加如下内容

<dependencies>
		<dependency>
		  <groupId>junit</groupId>
		  <artifactId>junit</artifactId>
		  <version>4.8.2</version>
		  <scope>test</scope>
		</dependency>
		<!-- 注意以下三个的版本一致 -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>3.1.2</version>
		</dependency>

		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>3.1.2</version>
		</dependency>
		
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-mapreduce-client-core</artifactId>
			<version>3.1.2</version>
		</dependency>
	</dependencies>

2）手动导：
1.$HADOOP_HOME/share/hadoop/common/下的包；
2.$HADOOP_HOME/share/hadoop/common/lib/下的包；
2.$HADOOP_HOME/share/hadoop/hdfs/下的包；
2.$HADOOP_HOME/share/hadoop/mapreduce/下的包；

3.代码

3.1 接收类，需可序列化

package hadoop.mr.dc;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class DataBean implements Writable{
	private String tel;
	private long upPayLoad;
	private long downPayLoad;
	private long totalPayLoad;
	public DataBean(){}
	
	public DataBean(String tel, long upPayLoad, long downPayLoad) {
		super();
		this.tel = tel;
		this.upPayLoad = upPayLoad;
		this.downPayLoad = downPayLoad;
		this.totalPayLoad = upPayLoad + downPayLoad;
	}
	@Override
	public String toString() {
		return this.upPayLoad + "\t" + this.downPayLoad + "\t" + this.totalPayLoad;
	}

	// 以下两个必不可少，是实现可序列化的关键
	@Override
	public void write(DataOutput out) throws IOException {
		out.writeUTF(tel);
		out.writeLong(upPayLoad);
		out.writeLong(downPayLoad);
		out.writeLong(totalPayLoad);
	}
	@Override
	public void readFields(DataInput in) throws IOException {
		this.tel = in.readUTF();
		this.upPayLoad = in.readLong();
		this.downPayLoad = in.readLong();
		this.totalPayLoad = in.readLong();
		
	}

	public String getTel() {
		return tel;
	}
	public void setTel(String tel) {
		this.tel = tel;
	}
	public long getUpPayLoad() {
		return upPayLoad;
	}
	public void setUpPayLoad(long upPayLoad) {
		this.upPayLoad = upPayLoad;
	}
	public long getDownPayLoad() {
		return downPayLoad;
	}
	public void setDownPayLoad(long downPayLoad) {
		this.downPayLoad = downPayLoad;
	}
	public long getTotalPayLoad() {
		return totalPayLoad;
	}
	public void setTotalPayLoad(long totalPayLoad) {
		this.totalPayLoad = totalPayLoad;
	}
}

3.2 DataCount(包含Mapper和Reducer)

package hadoop.mr.dc;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class DataCount {
	public static class DCMapper extends Mapper<LongWritable, Text, Text, DataBean>{
		@Override
		protected void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			//accept 
			String line = value.toString();
			//split
			String[] fields = line.split("\t");
			String tel = fields[1];
			long up = Long.parseLong(fields[8]);
			long down = Long.parseLong(fields[9]);	
			DataBean bean = new DataBean(tel, up, down);
			//send
			context.write(new Text(tel), bean);
		}
	}
	
	public static class DCReducer extends Reducer<Text, DataBean, Text, DataBean>{
		@Override
		protected void reduce(Text key, Iterable<DataBean> values, Context context)
				throws IOException, InterruptedException {
			long up_sum = 0;
			long down_sum = 0;
			for(DataBean bean : values){
				up_sum += bean.getUpPayLoad();
				down_sum += bean.getDownPayLoad();
			}
			DataBean bean = new DataBean("", up_sum, down_sum);
			context.write(key, bean);
		}
	}
	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf);
		
		job.setJarByClass(DataCount.class);
		
		job.setMapperClass(DCMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(DataBean.class);
		FileInputFormat.setInputPaths(job, new Path(args[0]));//输入路径
		
		job.setReducerClass(DCReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(DataBean.class);
		FileOutputFormat.setOutputPath(job, new Path(args[1]));//输出路径
		
		job.waitForCompletion(true);
	}
}

打成jar包，运行：

hadoop jar [jar包路径] [Main方法所在类(包含完整包名)] [输入路径] [输出路径]

Over！

少羽baby

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录