Hbase学习笔记

最新推荐文章于 2024-07-11 10:47:04 发布

雨落

最新推荐文章于 2024-07-11 10:47:04 发布

阅读量1k

点赞数

分类专栏： Hbase NoSql 数据库文章标签： hbase mapreduce table byte class 集群

本文链接：https://blog.csdn.net/anbo724/article/details/6982251

版权

数据库同时被 3 个专栏收录

9 篇文章 0 订阅

订阅专栏

Hbase

8 篇文章 0 订阅

订阅专栏

NoSql

8 篇文章 0 订阅

订阅专栏

当 MapReduce job的HBase table 使用TableInputFormat为数据源格式的时候,他的splitter会给这个table的每个region一个map。因此，如果一个table有100个region，就有100个map-tasks，不论需要scan多少个column families 。

要想使HBase作为MapReduce的source,Job需要使用TableMapReduceUtil来配置，如下所示...

Job job = ...;	
Scan scan = new Scan();
scan.setCaching(500);  // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  
// Now set other scan attrs
...
  
TableMapReduceUtil.initTableMapperJob(
  tableName,   		// input HBase table name
  scan, 			// Scan instance to control CF and attribute selection
  MyMapper.class,		// mapper
  Text.class,		// reducer key 
  LongWritable.class,	// reducer value
  job			// job instance
  );

...mapper需要继承于 TableMapper ...

public class MyMapper extends TableMapper<Text, LongWritable> {
public void map(ImmutableBytesWritable row, Result value, Context context) 
throws InterruptedException, IOException {
// process data for the row from the Result instance.

尽管现有的框架允许一个HBase table作为一个MapReduce job的输入，其他的Hbase table可以同时作为普通的表被访问。例如在一个MapReduce的job中，可以在Mapper的setup方法中创建HTable实例。

public class MyMapper extends TableMapper<Text, LongWritable> {
  private HTable myOtherTable;

  @Override
  public void setup(Context context) {
    myOtherTable = new HTable("myOtherTable");
  }

下面的Get操作会只获得最新的一个版本。

 Get get = new Get(Bytes.toBytes("row1"));
        Result r = htable.get(get);
        byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value

下面的Get操作会获得最近的3个版本。

 Get get = new Get(Bytes.toBytes("row1"));
        get.setMaxVersions(3);  // will return last 3 versions of row
        Result r = htable.get(get);
        byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value
        List<KeyValue> kv = r.getColumn(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns all versions of this column

下面的Put操作不指明版本，所以Hbase会用当前时间作为版本。

 Put put = new Put(Bytes.toBytes(row));
          put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), Bytes.toBytes( data));
          htable.put(put);

下面的Put操作，指明了版本。

 Put put = new Put( Bytes.toBytes(row ));
          long explicitTimeInMs = 555;  // just an example
          put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), explicitTimeInMs, Bytes.toBytes(data));
          htable.put(put);

Hbase客户端的 HTable类负责寻找相应的RegionServers来处理行。他是先查询 .META. 和 -ROOT 目录表。然后再确定region的位置。定位到所需要的区域后，客户端会直接去访问相应的region(不经过master)，发起读写请求。这些信息会缓存在客户端，这样就不用每发起一个请求就去查一下。如果一个region已经废弃(原因可能是master load balance或者RegionServer死了)，客户端就会重新进行这个步骤，决定要去访问的新的地址。

管理集群操作是经由HBaseAdmin发起的

关于连接的配置信息，参见Section 3.7, “连接Hbase集群的客户端配置和依赖”.

HTable不是线程安全的。建议使用同一个HBaseConfiguration实例来创建HTable实例。这样可以共享ZooKeeper和socket实例。例如，最好这样做：

HBaseConfiguration conf = HBaseConfiguration.create();
HTable table1 = new HTable(conf, "myTable");
HTable table2 = new HTable(conf, "myTable");

而不是这样：

HBaseConfiguration conf1 = HBaseConfiguration.create();
HTable table1 = new HTable(conf1, "myTable");
HBaseConfiguration conf2 = HBaseConfiguration.create();
HTable table2 = new HTable(conf2, "myTable");

如果你想知道的更多的关于Hbase客户端connection的知识，可以参照： HConnectionManager.

雨落

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hbase学习笔记

当 MapReduce job的HBase table 使用TableInputFormat为数据源格式的时候,他的splitter会给这个table的每个region一个map。因此，如果一个table有100个region，就有100个map-tasks，不论需要scan多少个column families 。要想使HBase作为MapReduce的source,Job需要使用Tabl
复制链接

扫一扫