https://www.cnblogs.com/hanganglin/p/4558510.html
map输出: 逻辑:读到一行就作为map的输入,然后u_id作为key,其他以u#为前缀加上其他为value(shuffle时候就会把key相同的发送个同一个reduce,以后reduce就可以根据匹配做join了)
<1, u#Lixiaolong> <1,l#2015-06-07. 192.168.137.101>
<2, u#JetLi> <1,l#2015-06-07 192.168.137.103>
<3, u#Zhangsan>
shuffle后作为reduce的输入:
<1,Lixiaolong>
<1,2015-06-07. 192.168.137.101>
<1,2015-06-07 192.168.137.103>
<1, < #uLixiaolong, l#2015-06-07. 192.168.137.101,l#2015-06-0, 192.168.137.103>>
reduce逻辑:根据value的前缀分别为u#和l#区别是user还是order然后把它们放到集合里面
linkU : [ Lixiaolong, Zhangsan]
linkL: [2015-06-07. 192.168.137.101, 2015-06-0, 192.168.137.103]
Pasted Graphic 2.tiff �
每一个u搭配一个l(u跟l做笛卡尔积)
就会变成:
(key是1)
<1, Lixiaolong, 2015-06-07. 192.168.137.101>
<1, Lixiaolong, 2015-06-0, 192.168.137.103>
<1, Zhangsan, 2015-06-0, 192.168.137.103>
<1,Zhangsan, 015-06-07. 192.168.137.101>
总结: