HBase：一个官方的HBase MapReduce Summary to HBase Example使用和分析

最新推荐文章于 2021-05-19 18:32:06 发布

编程写手

最新推荐文章于 2021-05-19 18:32:06 发布

阅读量201

点赞数 1

分类专栏： Apache HBase

本文链接：https://blog.csdn.net/weixin_45492007/article/details/106853105

版权

Apache HBase 专栏收录该内容

21 篇文章 2 订阅

订阅专栏

1.声明

当前内容主要用于本人学习和复习，内容主要为官方的HBase的汇总dem(将一个表中的数据提取出来并实现插入到另外一张表中实现汇总操作）

为前面的表中添加一条数据
在这里插入图片描述
创建filter-user-count表并添加列族：cf

2.使用并测试官方代码

/**
 * @description 实现表的汇总
 * @author hy
 * @date 2020-06-19
 */
public class SummaryTableExample {
	private static String sourceTable = "test-filter";
	private static String targetTable = "filter-user-count";

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration config = HBaseConfiguration.create();
		config.set("hbase.zookeeper.quorum", "192.168.1.101:2181");
		Job job = new Job(config, "ExampleSummary");
		job.setJarByClass(SummaryTableExample.class); // class that contains mapper and reducer
		Scan scan = new Scan();
		scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
		scan.setCacheBlocks(false); // don't set to true for MR jobs
		// set other scan attrs
		TableMapReduceUtil.initTableMapperJob(sourceTable, // input table
				scan, // Scan instance to control CF and attribute selection
				MyMapper.class, // mapper class
				Text.class, // mapper output key
				IntWritable.class, // mapper output value
				job);
		TableMapReduceUtil.initTableReducerJob(targetTable, // output table
				MyTableReducer.class, // reducer class
				job);
		job.setNumReduceTasks(1); // at least one, adjust as required
		boolean b = job.waitForCompletion(true);
		if (!b) {
			throw new IOException("error with job!");
		}
	}

	// 需要读取表后的映射
	public static class MyMapper extends TableMapper<Text, IntWritable> {

		public static final byte[] CF = "cf".getBytes();
		public static final byte[] NAME = "name".getBytes();
		private final IntWritable ONE = new IntWritable(1);
		private Text text = new Text();

		@Override
		public void map(ImmutableBytesWritable row, Result value, Context context)
				throws IOException, InterruptedException {
			String val = new String(value.getValue(CF, NAME));// 获取名称
			text.set(val); // we can only emit Writables... 为当前的名称写入数量
			context.write(text, ONE);
			System.out.println("执行map方法...........");
		}

	}

	//需要写入表的其他操作
	public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
		public static final byte[] CF = "cf".getBytes();
		public static final byte[] COUNT = "count".getBytes();

		@Override
		protected void reduce(Text key, Iterable<IntWritable> values, Context context)
				throws IOException, InterruptedException {
			int i = 0;
			for (IntWritable val : values) {
				i += val.get();
			}
			Put put = new Put(Bytes.toBytes(key.toString()));// 以名称作为row key
			put.addColumn(CF, COUNT, Bytes.toBytes(i));
			context.write(null, put);
			System.out.println("执行reduce方法...........");
		}
	}
}

又出现了错误：Exception in thread “main” java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z，原因是今天的电脑重启了，最后解决办法就是讲hadoop.dll放入System32下面就可以了

执行结果：
在这里插入图片描述

map执行了四次(说明读取到了四行数据)，reduce执行了三次(说明插入了三行数据)

查看当前的filter-user-count表中的内容
在这里插入图片描述
发现结果就是：统计当前row中的name数量

所以上面的内容就是从test-filter表中读取row中的cf:name值，通过reduce对map中的数据进行统计操作并实现写入

3.分析当前的代码

在这里插入图片描述

个人看来这个job.setNumReduceTask的数量是和当前的initTableReducerJob的数量应该保持一致

initTableMapperJob中的scan是可以指定从srcTable中需要查找的列。Text.class是mapper输出的key（刚好和当前的MyTableReducer中的泛型对应），IntWritable.class是mapper输出的值(刚好和MyTableReducer中的泛型)

所以本人推断：当前的TableMapperJob就是读取src表中的指定数据，并将其转换为Key=Text.class和Value=InitWritable.class,之后通过map方法进行存储数据，之后通过reduce方法对存储的数据进行任务写入target表操作

通过scan 'filter-user-count',{RAW=>true,VERSIONS=>1000},发现并没有覆盖数据

在这里插入图片描述

所以由此得出：reduce是处理map处理后的结果，实现数据写入延迟操作，通过 job.waitForCompletion(true);实现同步等待写入操作

4.总结

1.可以为当前的执行操作job中添加mapper和reduce,reduce将会处理mapper中的结果，最后写入，而前面的一个demo中只有一个mapper却可以将reduce设置为null，说明mapper和reduce都是任务执行的功能

2.当前的job.setNumReduceTasks(1);是按照当前的reduce的数量的来设定的不能瞎写

以上纯属个人见解，如有问题请联系本人！

编程写手

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
HBase：一个官方的HBase MapReduce Summary to HBase Example使用和分析

1.声明当前内容主要用于本人学习和复习，内容主要为官方的HBase的汇总dem(将一个表中的数据提取出来并实现插入到另外一张表中实现汇总操作）为前面的表中添加一条数据创建filter-user-count表并添加列族：cf2.使用并测试官方代码/** * @description 实现表的汇总 * @author hy * @date 2020-06-19 */public class SummaryTableExample { private static String sourc
复制链接

扫一扫

专栏目录