hbase mapreduce例子

最新推荐文章于 2020-03-11 00:36:22 发布

SAN_YUN

最新推荐文章于 2020-03-11 00:36:22 发布

阅读量133

点赞数

分类专栏： HBase 文章标签：大数据 java

本文链接：https://blog.csdn.net/SAN_YUN/article/details/84515078

版权

HBase 专栏收录该内容

34 篇文章 0 订阅

订阅专栏

参考：

http://hbase.apache.org/book/mapreduce.html

http://genius-bai.iteye.com/blog/641927

HBase 自带例子

hbase-0.20.3\src\test

计算表的总行数（org.apache.hadoop.hbase.mapreduce.RowCounter）

bin/hadoop jar /home/iic/hbase-0.20.3/hbase-0.20.3.jar rowcounter scores grade

结果

10/04/12 17:08:05 INFO mapred.JobClient: ROWS=2

对HBase的列进行Lucene索引(examples.TestTableIndex)

对表mrtest的列contents进行索引，使用lucene-core-2.2.0.jar，需把它加入类路径。把lucene-core-2.2.0.jar加入到examples.zip/lib目录下，同时代码中必须指定job.setJarByClass(TestTableIndex.class);不然lucene不识别

bin/hadoop fs -rmr testindex

bin/hadoop jar examples.zip examples.TestTableIndex

先从文件中产生适合HBase的HFiles文件，再倒入到Hbase中，加快导入速度

examples.TestHFileOutputFormat

输入的数据，由例子自动生成,其中Key是前面补0的十位数“0000000001”。

输出数据目录：/user/iic/hbase-hfile-test

bin/hadoop fs -rmr hbase-hfile-test

bin/hadoop jar examples.zip examples.TestHFileOutputFormat

加载生成的数据到Hbae中（要先安装JRuby，才能执行）

export PATH=$PATH:/home/iic/jruby-1.4.0/bin/

echo $PATH

vi bin/loadtable.rb

require '/home/iic/hbase-0.20.3/hbase-0.20.3.jar'
require '/home/iic/hadoop-0.20.2/hadoop-0.20.2-core.jar'
require '/home/iic/hadoop-0.20.2/lib/log4j-1.2.15.jar'
require '/home/iic/hadoop-0.20.2/lib/commons-logging-1.0.4.jar'
require '/home/iic/hbase-0.20.3/lib/zookeeper-3.3.0.jar'
require '/home/iic/hbase-0.20.3/lib/commons-cli-2.0-SNAPSHOT.jar'

$CLASSPATH <<'/home/iic/hbase-0.20.3/conf';

delete table "hbase-test"

jruby bin/loadtable.rb hbase-test /user/iic/hbase-hfile-test

查看其使用方式

（bin/hbase org.jruby.Main bin/loadtable.rb

Usage: loadtable.rb TABLENAME HFILEOUTPUTFORMAT_OUTPUT_DIR

其使用JRuby

）

注意：此种方式，必须解决几个问题

1：your MapReduce job ensures a total ordering among all keys ，by default distributes keys among reducers using a Partitioner that hashes on the map task output key。(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks

默认MR在使用默认的default hash Partitioner 分配Key给Reducer的时候，如果Key是0~4，有2个Task，则

reducer 0 would have get keys 0, 2 and 4 whereas reducer 1 would get keys 1 and 3 (in order).

则生成的Block里面的Start key 和 End Key次序讲混乱，

System.out.println((new ImmutableBytesWritable("0".getBytes())
.hashCode() & Integer.MAX_VALUE)

所以需要实现自己的Hash Partitioner ，生成the keys need to be orderd so reducer 0 gets keys 0-2 and reducer 1 gets keys 3-4 (See TotalOrderPartitioner up in hadoop for more on what this means).

验证导入的行数

bin/hadoop jar /home/iic/hbase-0.20.3/hbase-0.20.3.jar rowcounter hbase-test info