需求:
address.txt:
1 Beijing
2 Guangzhou
3 Shenzhen
4 Xian
factory.txt:
Beijing Red Star 1
Shenzhen Thunder 3
Guangzhou Honda 2
Beijing Rising 1
Guangzhou Development Bank 2
Tencent 3
Back of Beijing 1
结果:
factory city
Beijing Red Star Beijing
Shenzhen Thunder Shenzhen
Guangzhou Honda Guangzhou
Beijing Rising Beijing
Guangzhou Development Bank Guangzhou
Tencent Shenzhen
Back of Beijing Beijing
分析:
map函数:<k2,v2>------<1,“1 ,Beijing Red Star ”> , <1, "1,Back of Beijing"> , <1,"0,Beijing">
reducer函数:<k2,v2>----<1,[ " 1 ,Beijing Red Star " ,"1,Back of Beijing", "0,Beijing" ]
关联查询需要一个标识位,我们需要利用标识位去得到所对应的value值。
1.Mapper.class
public class JoinMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line=value.toString();
FileSplit inputsplit=(FileSplit)context.getInputSplit();
String filename= inputsplit.getPath().toString();//得到文件路径名
if(line.contains("address.txt")| line.contains("factory.txt")){
return ;
}
String[] _str=line.split("\t"); //切分
if(filename.endsWith("address.txt")){
context.write(new Text(_str[0]), new Text("1,"+_str[1]));
}else{ // key作为标识位 //标识位
context.write(new Text(_str[1]),new Text("0,"+_str[0]));
}
}
}
2.Reducer。class
public class JoinReduce extends Reducer<Text, Text, Text, Text> {
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
context.write(new Text("工厂名"),new Text("城市"));
}//只执行一次
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
ArrayList<String> left=new ArrayList<String>();
ArrayList<String> right=new ArrayList<String>();
for(Text v:values){
if(v.toString().contains("1")){
left.add(v.toString().split(",")[1]);//city名
}else{
right.add(v.toString().split(",")[1]);//工厂名
}
}
for(int i=0;i<left.size();i++){//left 城市名
for(int j=0;j<right.size();j++){ //right 工厂名
context.write(new Text(right.get(j)), new Text(left.get(i)));
} //工厂名+ city
}
}
}
3.Driver.class
public class JoinDriver {
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException, URISyntaxException {
Configuration conf = new Configuration();
conf.set("mapred.job.queue.name", "order");
Path outfile = new Path("file:///D:/输出结果/joinout");
FileSystem fs = outfile.getFileSystem(conf);
if(fs.exists(outfile)){
fs.delete(outfile,true);
}
Job job = Job.getInstance(conf);
job.setJarByClass(JoinDriver.class);
job.setJobName("Sencondary Sort");
job.setMapperClass(JoinMapper.class);
job.setReducerClass(JoinReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("file:///D:/测试数据/join连接/"));
FileOutputFormat.setOutputPath(job,outfile);
System.exit(job.waitForCompletion(true)?0:1);
}
}
4.运行结果
address.txt:
1 Beijing
2 Guangzhou
3 Shenzhen
4 Xian
factory.txt:
Beijing Red Star 1
Shenzhen Thunder 3
Guangzhou Honda 2
Beijing Rising 1
Guangzhou Development Bank 2
Tencent 3
Back of Beijing 1
结果:
factory city
Beijing Red Star Beijing
Shenzhen Thunder Shenzhen
Guangzhou Honda Guangzhou
Beijing Rising Beijing
Guangzhou Development Bank Guangzhou
Tencent Shenzhen
Back of Beijing Beijing
总结:join解决表关联查询的时候,特别要锁定标识位,通常作为key,去比较筛选所得的value,最后context.write(),写出