HBase的MapReduce调用

最新推荐文章于 2023-04-16 03:45:52 发布

千里草竹

最新推荐文章于 2023-04-16 03:45:52 发布

阅读量1.1k

点赞数 3

分类专栏：大数据系列

本文链接：https://blog.csdn.net/u012848709/article/details/83744699

版权

大数据系列专栏收录该内容

36 篇文章 0 订阅

订阅专栏

楔子

学习了解HBase，使用系统环境是CentOS6.9，Hadoop等版本是CDH5.3.6

配置了Hadoop、HBase等环境变量，yarn可以直接使用。以下基于这些配置

1.1 查看HBase执行MapReduce所依赖的Jar包

[grq@hadoop hbase0986]$ bin/hbase mapredcp
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hbase0986/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop250/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-10-28 14:32:07,428 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/opt/module/hbase0986/lib/hbase-client-0.98.6-cdh5.3.6.jar:/opt/module/hbase0986/lib/hbase-server-0.98.6-cdh5.3.6.jar:/opt/module/hbase0986/lib/htrace-core-2.04.jar:/opt/module/hbase0986/lib/netty-3.6.6.Final.jar:/opt/module/hbase0986/lib/hbase-common-0.98.6-cdh5.3.6.jar:/opt/module/hbase0986/lib/high-scale-lib-1.1.1.jar:/opt/module/hbase0986/lib/zookeeper-3.4.5-cdh5.3.6.jar:/opt/module/hbase0986/lib/guava-12.0.1.jar:/opt/module/hbase0986/lib/protobuf-java-2.5.0.jar:/opt/module/hbase0986/lib/hbase-protocol-0.98.6-cdh5.3.6.jar:/opt/module/hbase0986/lib/hbase-hadoop-compat-0.98.6-cdh5.3.6.jar

1.2 环境导入

[grq@hadoop hbase0986]$ export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hbase0986/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop250/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-10-28 14:34:47,798 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[grq@hadoop hbase0986]$

HADOOP_HOME和HBASE_HOME已经配置为了系统变量，如果没有还需使用export临时导入

1.3 运行官方MapReduce任务

运行官方任务统计，person表示数据行数

[grq@hadoop hbase0986]$  yarn jar lib/hbase-server-0.98.6-cdh5.3.6.jar rowcounter person

1.4 使用MapRecude将数据导入到HBASE

1.4.1 准备数据

准备数据并上传到Hadoop，数据格式是使用tsv格式(\t为分割符号)
[grq@hadoop hbase0986]$ more …/data/fruit.txt

001	Apple	Red
002	Pig	blue
003	Pear	yelllow

1.4.2 创建HBASE表

create 'fruits','info'

1.4.3 HDFS创建文件夹并上传文件

在这里插入图片描述

1.4.4 执行MapReduce到HBase表中

[grq@hadoop hbase0986]$ yarn jar lib/hbase-server-0.98.6-cdh5.3.6.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruits hdfs://hadoop:9000/input/

查看结果

hbase(main):001:0> scan 'fruits'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hbase0986/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop250/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-10-28 14:57:08,685 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ROW                        COLUMN+CELL                                                                
 001                       column=info:color, timestamp=1540707102480, value=Red                      
 001                       column=info:name, timestamp=1540707102480, value=Apple                     
 002                       column=info:color, timestamp=1540707102480, value=blue                     
 002                       column=info:name, timestamp=1540707102480, value=Pig                       
 003                       column=info:color, timestamp=1540707102480, value=yelllow                  
 003                       column=info:name, timestamp=1540707102480, value=Pear                      
3 row(s) in 0.8320 seconds

2 HBase自定义MapReduce

HBase表数据迁移
在Hadoop阶段，编写了MR任务分别进程了Mapper和Reducer两个类，而在HBase中继承的是TableMapper和TableReducer两个类

–

将fruits表数据通过MR 迁移到fruits_mr表中

2.1 构建ReadFruitMapper类

用于读取fruits表数据

import java.io.IOException;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Mapper;

/**
 * @Title: ReadFruitsMapper.java
 * @Package cn.zhuzi.hbase.mr
 * @Description: TODO(hbase mr数据迁移)
 * @author 作者 grq
 * @version 创建时间：2018年11月5日 上午11:07:09
 *
 */
public class ReadFruitsMapper extends TableMapper<ImmutableBytesWritable, Put> {

	@Override
	protected void map(ImmutableBytesWritable key, Result value, Mapper<ImmutableBytesWritable, Result, ImmutableBytesWritable, Put>.Context context) throws IOException, InterruptedException {

		// 将fruits中数据提取出来，相当于将每一行数据读取出来放入到put对象中
		Put put = new Put(key.get());
		// 遍历添加column行
		for (Cell cell : value.rawCells()) {
			// 添加/克隆 列族 info
			if ("info".equals(org.apache.hadoop.hbase.util.Bytes.toString(CellUtil.cloneFamily(cell)))) {
				// 添加/克隆 列 name
				if ("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {
					// 将 该列加入到 put对象
					put.add(cell);

				} else if ("color".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {
					// 向该列cell加入put对象重
					put.add(cell);
				}
			}
		}
		// 将fruit读取到每行数据写入到context 中 作为map输出
		context.write(key, put);

	}
}

2.2 构建WriteFruitsMRreducer类

用于将读取到的数据写入到目标表中


import java.io.IOException;

import org.apache.commons.lang.builder.ToStringBuilder;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

/**
 * @Title: WriteFruitMRreducer.java
 * @Package cn.zhuzi.hbase.mr
 * @Description: TODO(hbase mr数据迁移 )
 * @author 作者 grq
 * @version 创建时间：2018年11月5日 上午11:19:48
 *
 */
public class WriteFruitsMRreducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {

	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
		// 读出来的每一行写入到fruit_mr
		for (Put put : values) {
			context.write(NullWritable.get(), put);
			System.out.println(ToStringBuilder.reflectionToString(put));
			System.out.println(ToStringBuilder.reflectionToString(NullWritable.get()));
			System.out.println();
		}
	}

}

2.3 构建FruitsMRJob

用于组装运行JOB任务

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * @Title: FruitsMRJob.java
 * @Package cn.zhuzi.hbase.mr
 * @Description: TODO(用一句话描述该文件做什么)
 * @author 作者 grq
 * @version 创建时间：2018年11月5日 下午12:06:04
 *          https://blog.csdn.net/ys_230014/article/details/83714141
 */
public class FruitsMRJob extends Configured implements Tool {

	@Override
	public int run(String[] args) throws Exception {
		Configuration conf = this.getConf();

		// 组装 JOB
		Job job = Job.getInstance(conf, this.getClass().getSimpleName());
		job.setJarByClass(FruitsMRJob.class);

		// 配置JOB
		Scan scan = new Scan();
		scan.setCacheBlocks(false);
		scan.setCaching(500);

		// 设置Mapper，导入的包是mapreduce
		TableMapReduceUtil.initTableMapperJob("fruits", // 数据源的表名
				scan, // scan 扫描控制器
				ReadFruitsMapper.class,// 设置 Mapper 类
				ImmutableBytesWritable.class,// 设置 Mapper 输出 key 类型
				Put.class,// 设置 Mapper 输出 value 值类型
				job// 设置给哪个 JOB
				);

		// 设置 Reduce
		TableMapReduceUtil.initTableReducerJob("fruits_mr", WriteFruitsMRreducer.class, job);
		// 设置Reduce数量，最小是1个
		job.setNumReduceTasks(1);
		boolean completion = job.waitForCompletion(true);
		if (!completion) {
			throw new IOException(" JOB 运行错误");
		}
		return completion ? 0 : 1;
	}

	public static void main(String[] args) throws Exception {
		Configuration conf = HBaseConfiguration.create();
		conf.set("hbase.zookeeper.quorum", "hadoop");// 单机
		// zookeeper地址
		conf.set("hbase.zookeeper.property.clientPort", "2181");// zookeeper端口
		int run = ToolRunner.run(conf, new FruitsMRJob(), args);
		System.exit(run);

	}
}

打包运行程序

注意

我是在window上eclipse直接运行的，HBASE在虚拟机里面，在运行的时候安全卫士提示了window本机的一个hadoop
工具是否运行。

在这里插入图片描述

千里草竹

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
HBase的MapReduce调用

楔子学习了解HBase，使用系统环境是CentOS6.9，Hadoop等版本是CDH5.3.6配置了Hadoop、HBase等环境变量，yarn可以直接使用。以下基于这些配置1.1 查看HBase执行MapReduce所依赖的Jar包[grq@hadoop hbase0986]$ bin/hbase mapredcpSLF4J: Class path contains multi...
复制链接

扫一扫

专栏目录