题目描述:
现在有两个文件,1为存放公司名字和城市ID,2为存放城市ID和城市名
表一:
factoryname,addressed
Beijing Red Star,1
Shenzhen Thunder,3
Guangzhou Honda,2
Beijing Rising,1
Guangzhou Development Bank,2
Tencent,3
Back of Beijing,1
Beijing Red Star,1
Shenzhen Thunder,3
Guangzhou Honda,2
Beijing Rising,1
Guangzhou Development Bank,2
Tencent,3
Back of Beijing,1
表2:
1,Beijing
2,Guangzhou
3,Shenzhen
4,Xian
2,Guangzhou
3,Shenzhen
4,Xian
现在要求输出公司名和城市名。例如:
Beijing Red Star Beijing
这个类似数据库里的多表连接。整体思路和单表连接差不多。还是利用reduce阶段对城市ID进行归并,我们在map阶段统一输出key=城市ID value=falg+“+”+城市名or公司名。然后通过reduce对flag的解析,分析后者是城市名还是公司名,并放到两个数组中,最后利用笛卡尔积将其输出
具体代码
public
class
MyMapper
extends
Mapper<LongWritable, Text, Text, Text> {
public
void
map(LongWritable ikey, Text ivalue, Context context )
throws
IOException, InterruptedException {
String line=ivalue.toString();
StringTokenizer st=
new
StringTokenizer(line,
","
);
String value0=st.nextToken();
String value1=st.nextToken();
if
(value0.compareTo(
"factoryname"
)!=0){
if
(value0.length()==1){
context.write(
new
Text(value0),
new
Text(
"1"
+
"+"
+value1));
}
else
{
context.write(
new
Text(value1),
new
Text(
"2"
+
"+"
+value0));
}
}
}
}
public
class
MyReducer
extends
Reducer<Text, Text, Text, Text> {
public
void
reduce(Text _key, Iterable<Text> values, Context context)
throws
IOException, InterruptedException {
// process values
ArrayList<String> address=
new
ArrayList<String>();
ArrayList<String> factory=
new
ArrayList<String>();
for
(Text val : values) {
String line=val.toString();
StringTokenizer st=
new
StringTokenizer(line,
"+"
);
int
flag=Integer.parseInt(st.nextToken());
if
(flag==1){
String addressname=st.nextToken();
address.add(addressname);
}
else
if
(flag==2){
String factoryname=st.nextToken();
factory.add(factoryname);
}
}
if
(address.size()!=0&&factory.size()!=0){
for
(
int
i=0;i<address.size();i++){
for
(
int
j=0;j<factory.size();j++){
context.write(
new
Text(address.get(i)),
new
Text(factory.get(j)));
}
}
}
}
}