reduce的join

最新推荐文章于 2024-05-17 16:11:35 发布

jsperlee

最新推荐文章于 2024-05-17 16:11:35 发布

阅读量465

点赞数

分类专栏： mapreduce

本文链接：https://blog.csdn.net/qq_27347421/article/details/104162094

版权

reduce端的join

join过程在reduce端进行
join将两个表中相同的id的数据拼接在一起，拼接的过程在reduce端进行 a:1 zs b:1 45
想在reduce端拼接这个数据，保证reduce端能够接受到的数据中a:1 zs 、 b:1 45在一组，相同的关联键的数据被分到一组
map端key：两个表的关联键 reduce端接受的数据相同关联键的两个表中的数据
map端：接触文件的
key：两表的关联键
value：两个表的其他需要的字段，打标记，标识数据来源，不同的表的数据，不同的标记
a:“a”+name
b:“b”+age
reduce端：相同关联建的两个表中的数据，将两个表的数据进行拼接，不同表的数据
a:1 zs b:1 45 b:1 38
要求reduce端接受的values的数据能够识别来源的 <azs,b45,b38>

案例：

movies.dat 数据格式为： 2::Jumanji (1995)::Adventure|Children’s|Fantasy
对应字段为：MovieID BigInt, Title String, Genres String
对应字段中文解释：电影ID，电影名字，电影类型

ratings.dat 数据格式为： 1::1193::5::978300760 对应字段为：UserID BigInt,
MovieID BigInt, Rating Double, Timestamped String
对应字段中文解释：用户ID，电影ID，评分，评分时间戳

Mapper源码

package mapper;

import java.io.IOException;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.RawComparator;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.mapreduce.MapContext;
import org.apache.hadoop.mapreduce.task.MapContextImpl;

/** 
 * Maps input key/value pairs to a set of intermediate key/value pairs.  
 * 
 * <p>Maps are the individual tasks which transform input records into a 
 * intermediate records. The transformed intermediate records need not be of 
 * the same type as the input records. A given input pair may map to zero or 
 * many output pairs.</p> 
 * 
 * <p>The Hadoop Map-Reduce framework spawns one map task for each 
 * {@link InputSplit} generated by the {@link InputFormat} for the job.
 * <code>Mapper</code> implementations can access the {@link Configuration} for 
 * the job via the {@link JobContext#getConfiguration()}.
 * 
 * <p>The framework first calls 
 * {@link #setup(org

最低0.47元/天解锁文章

jsperlee

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
reduce的join

reduce-joinjoin过程在reduce端进行join将两个表中相同的id的数据拼接在一起拼接的过程在reduce端进行a:1 zs b:1 45想在reduce端拼接这个数据保证reduce端能够接受到的数据中a:1 zs b:1 45在一组相同的关联键的数据被分到一组map端key：两个表的关联键reduce端接受的数据相同关联建的两个表中的数...
复制链接

扫一扫

专栏目录