Hadoop HelloWorld Examples - 求k临近点(+自定义变量+参数传入)

  懂得了Map-Reduce的原理后,很容易类推,把一些常见问题也搬到Hadoop上。这次尝试下经典的求K临近点。同时为了更加深入地学习下Hadoop各种特性,除了算法上把k临近点问题映射到map - reduce上,还尝试了(1)Hadoop的通过Configuration传入参数(给map或reduce),(2)自定义数据类型作为value/key,(3)重写FileInputFormat和RecordReader来读入自己定义的数据。
  问题描述:给定一个中心点cpoint,和另外一堆点points,求另外一堆点points到cpoint的距离,并且按照从小到达的顺序排列(然后再自己遍历一边就求得k近邻了,这里忽略这一步)。
  Map-Reduce算法:每个map对应接受points里面的一个点point,然后在map里求point和中心点cpoint(通过Configuration传入)的距离,然后把该距离作为map的输出key,而 map的输出value为该点point坐标。这样子经过shuffle&sort中间步骤后,到达reduce时key(即每个点到cpoint的距离)已经按照大小排好序,直接输出即可。
  输入数据(各个点的坐标):
  2 2
  3 4
  9 8
  20 10
  20 10
  15 14
  89 15

  输出数据(第一个数据是距离指定点cpoint的坐标,第二个数据是点point自己的坐标,cpoint在main函数里面通过Configuration指定和传入)
  2.828427       2.0,2.0
  5.0                   3.0,4.0
  12.0415945   9.0,8.0
  20.518284     15.0,14.0
  22.36068       20.0,10.0

  22.36068       20.0,10.0

  90.255196     89.0,15.0

  具体的Map的Input key是当前字符偏移量,Input value是每个点的2D坐标。

  代码:

  首先是自定义的类Point2D,为了让Map和Reducer能够接受该自定义类作为key/value,必须相应继承Interface:WritableComparable/Writable,因为map-reduce有自己的serialize和deserialize机制。

  补充一点是我找了很多的材料,都说自定义的类只要继承这两个相应的接口,map和reduce就能识别并且接受它们作为key/value。但我代码试了下不行。Eclipse提示因为我用了默认TextInputFileFormat,所以value必须是Text,而不能是我自己定义的Point2D.为此我不得不实现了自定义的FileInputFormat和RecordReader,来解析我自己的数据结构Point2D。

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class Point2D implements Writable{

    public float x;
    public float y;
    
    public Point2D(float x, float y)
    {
        this.x = x;
        this.y = y;
    }
    
    public Point2D()
    {
        this.x = 0;
        this.y = 0;
    }
    
    @Override
    public void readFields(DataInput in) throws IOException {
        // TODO Auto-generated method stub
        x = in.readFloat();
        y = in.readFloat();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        // TODO Auto-generated method stub
        out.writeFloat(x);
        out.writeFloat(y);
    }
    
    @Override
    public String toString()
    {
        return Float.toString(x) + "," + Float.toString(y);
    }

}

  自定义的InputFormat

import java.io.IOException;

import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.lib.input.*;

public class KNPointInputFormat extends FileInputFormat<LongWritable, Point2D>{

	@Override
	public RecordReader<LongWritable, Point2D> createRecordReader(
			InputSplit arg0, TaskAttemptContext arg1) throws IOException,
			InterruptedException {
		// TODO Auto-generated method stub
		return new PointRecordReader();
	}
	
}

  注意上面的
PointRecordReader

  用来解析我们自己变量的RecordReader,具体代码如下:

import java.io.IOException;

import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.lib.input.LineRecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.io.*;

 class PointRecordReader extends RecordReader<LongWritable,Point2D>{

	 private LineRecordReader lineRecordReader;
	 private LongWritable key;
	 private Point2D value;
	 
	 public PointRecordReader()
	 {
		 lineRecordReader = new LineRecordReader();
	 }
	 
	 
	@Override
	public void close() throws IOException {
		// TODO Auto-generated method stub
		lineRecordReader.close();
	}

	@Override
	public LongWritable getCurrentKey() throws IOException,
			InterruptedException {
		// TODO Auto-generated method stub
		return key;
	}

	@Override
	public Point2D getCurrentValue() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		return value;
	}

	@Override
	public float getProgress() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		return lineRecordReader.getProgress();
	}

	@Override
	public void initialize(InputSplit arg0, TaskAttemptContext arg1)
			throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		lineRecordReader.initialize(arg0, arg1);
	}

	@Override
	public boolean nextKeyValue() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		
		if(lineRecordReader.nextKeyValue() == false)
		{
			return false;
		}
		
		key = lineRecordReader.getCurrentKey();
		if(value == null)
		{
			value = new Point2D();
		}
		String lineValue = lineRecordReader.getCurrentValue().toString();
		String[] xy = lineValue.split(" ");
		value.x = Float.parseFloat(xy[0]);
		value.y = Float.parseFloat(xy[1]);
		
		
		return true;
	}

}


  最后是Map-Reduce的主类

import java.io.*;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;

public class KNPoints {
	
	public static class KNPointMapper extends Mapper<LongWritable, Point2D, FloatWritable, Point2D>
	{
		@Override
		public void map(LongWritable key, Point2D value, Context context) throws IOException, InterruptedException
		{
			Configuration conf = context.getConfiguration();
			
			String cPos = conf.get("cPoint");//The center point that we want to calculate the others' distance from it.
			String[] cxy = cPos.split(" ");
			Point2D cpoint = new Point2D();
			cpoint.x = Integer.parseInt(cxy[0]);
			cpoint.y = Integer.parseInt(cxy[1]);
			
			float dis = (float)Math.sqrt( Math.pow((value.x - cpoint.x), (2.0f)) + 
					 Math.pow((value.y - cpoint.y), (2.0f)));
			
			context.write(new FloatWritable(dis), value);
		}
	}
	
	public static class KNPointReducer extends Reducer<FloatWritable, Point2D, FloatWritable, Point2D>
	{
		@Override
		public void reduce(FloatWritable key, Iterable<Point2D> values, Context context) throws IOException, InterruptedException
		{
			for(Point2D val : values)
			{
				context.write(key, val);
			}
		}
	}
	
	public static void main(String[] args) throws Exception
	{
		Configuration conf = new Configuration();
		conf.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));
		conf.set("cPoint", "0 0");//Define the center point. We calculate the others' distance with the center point.
		
		Job job = new Job(conf);
		
		job.setInputFormatClass(KNPointInputFormat.class);//My own input format
		job.setOutputFormatClass(TextOutputFormat.class);
		
		job.setJarByClass(KNPoints.class);
		
		job.setMapperClass(KNPointMapper.class);
		job.setReducerClass(KNPointReducer.class);
		
		job.setOutputKeyClass(FloatWritable.class);
		job.setOutputValueClass(Point2D.class);
		
		
		String in = "hdfs://localhost:9000/user/hadoop/input/data";
		String out = "hdfs://localhost:9000/user/hadoop/output";
		
		
		FileSystem fs = FileSystem.get(conf);
		fs.copyFromLocalFile(new Path("/home/hadoop/CodeSpace/KNPoints/data"), 
				new Path("hdfs://localhost:9000/user/hadoop/input/"));
		
		FileInputFormat.addInputPath(job, new Path(in));
		FileOutputFormat.setOutputPath(job, new Path(out));
		
		
		if(fs.exists(new Path(out)) == true)
		{
			fs.delete(new Path(out),true);
		}
		
		job.waitForCompletion(true);
	}

}


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值