Hadoop如何处理关联计算

 假设:HDFS上有2个文件,分别是客户信息和订单信息,customerID是它们之间的关联字段。如何进行关联计算,以便将客户名称添加到订单列表中?

    一般方法是:输入2个源文件。根据文件名在Map中处理每条数据,如果是Order,则在foreign key上加标记”O”,形成combined key;如果是Customer则做标记”C”。Map之后的数据按照key分区,再按照combined key分组排序。最后在reduce中合并结果再输出。

    实现代码:

01 public static class JMapper extends Mapper<LongWritable, Text, TextPair, Text> {
02     //mark every row with "O" or "C" according to file name
03     @Override
04     protected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {
05      String pathName = ((FileSplit) context.getInputSplit()).getPath().toString();
06      if (pathName.contains("order.txt")) {//identify order by file name
07             String values[] = value.toString().split("\t");
08             TextPair tp = new TextPair(new Text(values[1]), newText("O"));//mark with "O"
09             context.write(tp, new Text(values[0] + "\t" + values[2]));
10         }
11     if (pathName.contains("customer.txt")) {//identify customer by file name
12            String values[] = value.toString().split("\t");
13            TextPair tp = new TextPair(new Text(values[0]), newText("C"));//mark with "C"
14            context.write(tp, new Text(values[1]));
15         }
16     }
17 }
1 public static class JPartitioner extends Partitioner<TextPair, Text> {
2     //partition by key, i.e. customerID
3     @Override
4     public int getPartition(TextPair key, Text value, int numParititon) {
5         return Math.abs(key.getFirst().hashCode() * 127) % numParititon;
6     }
7 }
01 public static class JComparator extends WritableComparator {
02     //group by muti-key
03     public JComparator() {
04         super(TextPair.classtrue);
05     }
06     @SuppressWarnings("unchecked")
07     public int compare(WritableComparable a, WritableComparable b) {
08         TextPair t1 = (TextPair) a;
09         TextPair t2 = (TextPair) b;
10         return t1.getFirst().compareTo(t2.getFirst());
11     }
12 }
01 public static class JReduce extends Reducer<TextPair, Text, Text, Text> {
02     //merge and output
03     protected void reduce(TextPair key, Iterable<Text> values, Context context) throws IOException,InterruptedException {
04      Text pid = key.getFirst();
05      String desc = values.iterator().next().toString();
06      while (values.iterator().hasNext()) {
07          context.write(pid, new Text(values.iterator().next().toString() +"\t" + desc));
08     }
09     }
10 }
01 public class TextPair implements WritableComparable<TextPair> {
02     //make muti-key
03     private Text first;
04     private Text second;
05     public TextPair() {
06         set(new Text(), new Text());
07     }
08     public TextPair(String first, String second) {
09         set(new Text(first), new Text(second));
10     }
11     public TextPair(Text first, Text second) {
12         set(first, second);
13     }
14     public void set(Text first, Text second) {
15    this.first = first;
16    this.second = second;
17     }
18     public Text getFirst() {
19    return first;
20     }
21     public Text getSecond() {
22    return second;
23     }
24     public void write(DataOutput out) throws IOException {
25    first.write(out);
26    second.write(out);
27     }
28     public void readFields(DataInput in) throws IOException {
29    first.readFields(in);
30    second.readFields(in);
31     }
32     public int compareTo(TextPair tp) {
33    int cmp = first.compareTo(tp.first);
34    if (cmp != 0) {
35         return cmp;
36    }
37      return second.compareTo(tp.second);
38     }
39 }
01 public static void main(String agrs[]) throws IOException, InterruptedException, ClassNotFoundException {
02     //job entrance
03     Configuration conf = new Configuration();
04     GenericOptionsParser parser = new GenericOptionsParser(conf, agrs);
05     String[] otherArgs = parser.getRemainingArgs();
06     if (agrs.length < 3) {
07     System.err.println("Usage: J <in_path_one> <in_path_two> <output>");
08     System.exit(2);
09     }
10     Job job = new Job(conf, "J");
11     job.setJarByClass(J.class);//Join class
12     job.setMapperClass(JMapper.class);//Map class
13     job.setMapOutputKeyClass(TextPair.class);//Map output key class
14     job.setMapOutputValueClass(Text.class);//Map output value class
15     job.setPartitionerClass(JPartitioner.class);//partition class
16     job.setGroupingComparatorClass(JComparator.class);//condition group class after partition
17     job.setReducerClass(Example_Join_01_Reduce.class);//reduce class
18     job.setOutputKeyClass(Text.class);//reduce output key class
19     job.setOutputValueClass(Text.class);//reduce ouput value class
20     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));//one of source files
21     FileInputFormat.addInputPath(job, new Path(otherArgs[1]));//another file
22     FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));//output path
23     System.exit(job.waitForCompletion(true) ? 0 1);//run untill job ends
24 }

    不能直接使用原始数据,而是要搞一堆代码处理标记,并绕过MapReduce原本的架构,最后从底层设计并计算数据之间的关联关系。这还是最简单的关联计算,如果用MapReduce进行多表关联或逻辑更复杂的关联计算,复杂度会呈几何级数递增。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值