快速全量检索Hbase的核武器---------HfileInputFormat

13 篇文章 0 订阅


此博客是转载别人的,方法实现也是别人实现的,在此感谢这位大牛!

原博客地址:http://blog.csdn.net/kirayuan/article/details/7794402

我对这个实现改了一个小地方,当时是因为本地编译未通过........囧~~~~~

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.io.hfile.CacheConfig;
import org.apache.hadoop.hbase.io.hfile.HFile;
import org.apache.hadoop.hbase.io.hfile.HFileScanner;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

public class HFileInputFormat extends
		FileInputFormat<ImmutableBytesWritable, KeyValue> {

	private class HFileRecordReader extends
			RecordReader<ImmutableBytesWritable, KeyValue> {

		private HFile.Reader reader;
		private final HFileScanner scanner;
		private int entryNumber = 0;

		public HFileRecordReader(FileSplit split, Configuration conf)
				throws IOException {
			final Path path = split.getPath();
			reader = HFile.createReader(FileSystem.get(conf), path,new CacheConfig(conf));
			scanner = reader.getScanner(false, false);
			reader.loadFileInfo(); // This is required or else seekTo throws a
			// NPE
			scanner.seekTo(); // This is required or else scanner.next throws an
			// error
		}

		@Override
		public void close() throws IOException {
			if (reader != null) {
				reader.close();
			}
		}

		/*
		 * @Override public boolean next(ImmutableBytesWritable key, KeyValue
		 * value) throws IOException { entryNumber++; return scanner.next(); }
		 */

		@Override
		public ImmutableBytesWritable getCurrentKey() throws IOException,
				InterruptedException {
			// TODO Auto-generated method stub
			return new ImmutableBytesWritable(scanner.getKeyValue().getRow());
		}

		@Override
		public KeyValue getCurrentValue() throws IOException,
				InterruptedException {
			// TODO Auto-generated method stub
			return scanner.getKeyValue();
		}

		@Override
		public boolean nextKeyValue() throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			entryNumber++;
			return scanner.next();
		}

		@Override
		public float getProgress() throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			if (reader != null) {
				return (entryNumber / reader.getEntries());
			}
			return 1;
		}

		@Override
		public void initialize(InputSplit arg0, TaskAttemptContext arg1)
				throws IOException, InterruptedException {

		}

	}

	@Override
	protected boolean isSplitable(JobContext context, Path filename) {
		return false;
	}

	@Override
	public RecordReader<ImmutableBytesWritable, KeyValue> createRecordReader(
			InputSplit split, TaskAttemptContext context) throws IOException,
			InterruptedException {
		return new HFileRecordReader((FileSplit) split, context
				.getConfiguration());
	}

}

特别注意:在写MR的过程中,比较麻烦的是要写PathFilter过滤器。因为/hbase/ 目录下 有很多以"."开头的文件。以后博客会晒出合适的过滤器。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值