HBase与MapReduce集成

最新推荐文章于 2024-01-25 01:55:26 发布

牧码文

最新推荐文章于 2024-01-25 01:55:26 发布

阅读量626

点赞数

分类专栏： Hbase hadoop

本文链接：https://blog.csdn.net/weixin_46429290/article/details/119567148

版权

hadoop 同时被 2 个专栏收录

46 篇文章 2 订阅

订阅专栏

Hbase

3 篇文章 0 订阅

订阅专栏

HBase集成MapReduce

通过 HBase 的相关 JavaAPI，我们可以实现伴随 HBase 操作的 MapReduce 过程，比如使用MapReduce 将数据从本地文件系统导入到 HBase 的表中，比如我们从 HBase 中读取一些原始数据后使用 MapReduce 做数据分析。

官方 HBase-MapReduce

1．查看 HBase 的 MapReduce 任务的执行

$ bin/hbase mapredcp

2．环境变量的导入

（1）执行环境变量的导入（临时生效，在命令行执行下述操作）

$ export HBASE_HOME=/opt/module/hbase 
$ export HADOOP_HOME=/opt/module/hadoop3
$ export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`

（2）永久生效：在/etc/profile 配置

export HBASE_HOME=/opt/module/hbase 
export HADOOP_HOME=/opt/module/hadoop3

并在 hadoop-env.sh 中配置：（注意：在 for 循环之后配）

export HADOOP_CLASSPA TH=$HADOOP_CLASSPA TH:/opt/module/hbase/lib/*

3．运行官方的 MapReduce 任务

– 案例一：统计 Student 表中有多少行数据

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar rowcounter student

– 案例二：使用 MapReduce 将本地数据导入到 HBase

1）在本地创建一个 tsv 格式的文件：fruit.tsv

1001  Apple Red 
1002  Pear  Yellow 
1003  Pineapple Yellow

2）创建 Hbase 表

Hbase(main):001:0> create 'fruit','info'

3）在 HDFS 中创建 input_fruit 文件夹并上传 fruit.tsv 文件

$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -mkdir /input_fruit/ 
$ /opt/module/hadoop-2.7.2/bin/hdfs  dfs  -put  fruit.tsv /input_fruit/

4）执行 MapReduce 到 HBase 的 fruit 表中

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar importtsv \ 
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \ hdfs://hadoop102:9000/input_fruit

5）使用 scan 命令查看导入后的结果

Hbase(main):001:0> scan ‘fruit’

自定义 HBase-MapReduce1

目标：将 fruit 表中的一部分数据，通过 MR 迁入到 fruit_mr 表中。

分步实现：

1．构建 ReadFruitMapper 类，用于读取 fruit 表中的数据

public  class  ReadFruitMapper  extends 
TableMapper<ImmutableBytesWritable, Put> { 
 
  @Override 
   protected void map(ImmutableBytesWritable key, Result value, 
Context context)  
  throws IOException, InterruptedException { 
  //将 fruit 的 name 和 color 提取出来，相当于将每一行数据读取出来放入到 Put
对象中。 
   Put put = new Put(key.get()); 
   //遍历添加 column 行 
   for(Cell cell: value.rawCells()){ 
    //添加/克隆列族:info 
  
  if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){ 
     //添加/克隆列：name 
   
  if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell))
)){ 
      //将该列 cell 加入到 put 对象中 
      put.add(cell); 
      //添加/克隆列:color 
     }else 
if("color".equals(Bytes.toString(CellUtil.cloneQualifier(cell))))
{ 
      //向该列 cell 加入到 put 对象中 
      put.add(cell); 
     } 
    } 
   } 
   //将从 fruit 读取到的每行数据写入到 context 中作为 map 的输出 
   context.write(key, put); 
  } 
}

2．构建 WriteFruitMRReducer 类，用于将读取到的 fruit 表中的数据写入到 fruit_mr 表中

public  class  WriteFruitMRReducer  extends 
TableReducer<ImmutableBytesWritable, Put, NullWritable> { 
  @Override 
  protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context)  throws IOException, InterruptedException { 
       //读出来的每一行数据写入到 fruit_mr 表中 
       for(Put put: values){ 
        	context.write(NullWritable.get(), put); 
   } 
  } 
}

3．构建 Fruit2FruitMRRunner extends Configured implements T ool 用于组装运行 Job任务

//组装 Job 
  public int run(String[] args) throws Exception { 
   //得到 Configuration 
   Configuration conf = this.getConf(); 
   //创建 Job 任务 
   Job  job  =  Job.getInstance(conf, 
this.getClass().getSimpleName()); 
   job.setJarByClass(Fruit2FruitMRRunner.class); 
 
   //配置 Job 
   Scan scan = new Scan(); 
   scan.setCacheBlocks(false); 
   scan.setCaching(500); 
 
   //设置 Mapper，注意导入的是 mapreduce 包下的，不是 mapred 包下的，后者
是老版本 
   TableMapReduceUtil.initTableMapperJob( 
   "fruit", //数据源的表名 
   scan, //scan 扫描控制器 
   ReadFruitMapper.class,//设置 Mapper 类 
   ImmutableBytesWritable.class,//设置 Mapper 输出 key 类型 
   Put.class,//设置 Mapper 输出 value 值类型 
   job//设置给哪个 JOB 
   ); 
   //设置 Reducer 
   TableMapReduceUtil.initTableReducerJob("fruit_mr", 
WriteFruitMRReducer.class, job); 
   //设置 Reduce 数量，最少 1 个 
   job.setNumReduceTasks(1); 
 
   boolean isSuccess = job.waitForCompletion(true); 
   if(!isSuccess){ 
    throw new IOException("Job running with error"); 
   } 
   return isSuccess ? 0 : 1; 
  }

4．主函数中调用运行该 Job 任务

public static void main( String[] args ) throws Exception{ 
    Configuration conf = HbaseConfiguration.create(); 
    int status = ToolRunner.run(conf, new Fruit2FruitMRRunner(), args); 
    System.exit(status); 
}

5．打包运行任务

$ /opt/module/hadoop-2.7.2/bin/yarn jar ~/softwares/jars/Hbase-0.0.1-SNAPSHOT.jar com.z.Hbase.mr1.Fruit2FruitMRRunner

自定义 Hbase-MapReduce2

目标：实现将 HDFS 中的数据写入到 Hbase 表中。

分步实现：

1．构建 ReadFruitFromHDFSMapper 于读取 HDFS 中的文件数据

public class ReadFruitFromHDFSMapper extends Mapper<LongWritable, 
Text, ImmutableBytesWritable, Put> { 
  @Override 
  protected void map(LongWritable key, Text value, Context 
context) throws IOException, InterruptedException { 
   //从 HDFS 中读取的数据 
   String lineValue = value.toString(); 
   //读取出来的每行数据使用\t 进行分割，存于 String 数组 
   String[] values = lineValue.split("\t"); 
   
   //根据数据中值的含义取值 
   String rowKey = values[0]; 
   String name = values[1]; 
   String color = values[2]; 
   
   //初始化 rowKey 
   ImmutableBytesWritable  rowKeyWritable  =  new 
ImmutableBytesWritable(Bytes.toBytes(rowKey)); 
   
   //初始化 put 对象 
   Put put = new Put(Bytes.toBytes(rowKey)); 
   
   //参数分别:列族、列、值   
        put.add(Bytes.toBytes("info"),  Bytes.toBytes("name"),  
Bytes.toBytes(name));  
        put.add(Bytes.toBytes("info"),  Bytes.toBytes("color"),  
Bytes.toBytes(color));  
         
        context.write(rowKeyWritable, put); 
  } 
}

2．构建 WriteFruitMRFromTxtReducer 类

public  class  WriteFruitMRFromTxtReducer  extends TableReducer<ImmutableBytesWritable, Put, NullWritable> { 
  @Override 
  protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException 
{ 
   //读出来的每一行数据写入到 fruit_hdfs 表中 
   for(Put put: values){ 
    context.write(NullWritable.get(), put); 
   } 
  } 
}

**3．创建 Txt2FruitRunner 组装 Job **

public int run(String[] args) throws Exception { 
//得到 Configuration 
Configuration conf = this.getConf(); 
 
//创建 Job 任务 
Job job = Job.getInstance(conf, this.getClass().getSimpleName()); 
job.setJarByClass(Txt2FruitRunner.class); 
Path  inPath  =  new 
Path("hdfs://hadoop102:9000/input_fruit/fruit.tsv"); 
FileInputFormat.addInputPath(job, inPath); 
 
//设置 Mapper 
job.setMapperClass(ReadFruitFromHDFSMapper.class); 
job.setMapOutputKeyClass(ImmutableBytesWritable.class); 
job.setMapOutputValueClass(Put.class); 
 
//设置 Reducer 
TableMapReduceUtil.initTableReducerJob("fruit_mr", 
WriteFruitMRFromTxtReducer.class, job); 
 
//设置 Reduce 数量，最少 1 个 
job.setNumReduceTasks(1); 
 
boolean isSuccess = job.waitForCompletion(true); 
if(!isSuccess){ 
throw new IOException("Job running with error"); 
} 
 
return isSuccess ? 0 : 1; 
}

4．调用执行 Job

public static void main(String[] args) throws Exception { 
   Configuration conf = HBaseConfiguration.create(); 
     int status = ToolRunner.run(conf, new Txt2FruitRunner(), 
args); 
     System.exit(status); 
}

5．打包运行

$ /opt/module/hadoop-2.7.2/bin/yarn jar hbase-0.0.1-SNAPSHOT.jar com.gis.hbase.mr2.Txt2FruitRunner

牧码文

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
HBase与MapReduce集成

HBase集成MapReduce通过 HBase 的相关 JavaAPI，我们可以实现伴随 HBase 操作的 MapReduce 过程，比如使用MapReduce 将数据从本地文件系统导入到 HBase 的表中，比如我们从 HBase 中读取一些原始数据后使用 MapReduce 做数据分析。官方 HBase-MapReduce1．查看 HBase 的 MapReduce 任务的执行$ bin/hbase mapredcp2．环境变量的导入（1）执行环境变量的导入（临时生效，在命令行执行下述操
复制链接

扫一扫