源码解读--(2)hbase-examples BufferedMutator Example

源码解读--(1)hbase客户端源代码http://aperise.iteye.com/blog/2372350
源码解读--(2)hbase-examples BufferedMutator Examplehttp://aperise.iteye.com/blog/2372505
源码解读--(3)hbase-examples MultiThreadedClientExamplehttp://aperise.iteye.com/blog/2372534

1.摒弃HTable,直接创建HTable里的BufferedMutator对象操作hbase客户端完全可行

    在前面的hbase客户端源代码分析中,我们客户端的创建方式如下:

//默认connection实现是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation  
Connection connection = ConnectionFactory.createConnection(configuration);      
//默认table实现是org.apache.hadoop.hbase.client.HTable  
Table table = connection.getTable(TableName.valueOf("tableName")); 
  1. 默认我们拿到了connection的实现org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation,里面我们需要注意的是通过setupRegistry()类设置了与zookeeper交互的重要类org.apache.hadoop.hbase.client.ZookeeperRegistry类,后续与zookeeper交互都由此类完成
  2. 然后通过connection拿到了table的实现org.apache.hadoop.hbase.client.HTable
  3. 最后发现org.apache.hadoop.hbase.client.HTable归根结底持有的就是BufferedMutatorImpl类型的属性mutator,所有后续的操作都是基于mutator操作

    那么其实我们操作hbase客户端,完全可以摒弃HTable对象,直接构建BufferedMutator,然后操作hbase,正如所料在hbase的源码模块hbase-examples里也正好提到了这种使用方法,使用的关键代码如下:

Configuration configuration = HBaseConfiguration.create();      
configuration.set("hbase.zookeeper.property.clientPort", "2181");      
configuration.set("hbase.client.write.buffer", "2097152");      
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");

BufferedMutatorParams params = new BufferedMutatorParams(TableName.valueOf("tableName"));

//3177不是我杜撰的,是2*hbase.client.write.buffer/put.heapSize()计算出来的   
int bestBathPutSize = 3177;   

//这里利用jdk1.7里的新特性try(必须实现java.io.Closeable的对象){}catch (Exception e) {}
//相当于调用了finally功能,调用(必须实现java.io.Closeable的对象)的close()方法,也即会调用conn.close(),mutator.close()
try(
  //默认connection实现是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation 
  Connection conn = ConnectionFactory.createConnection(configuration);
  //默认mutator实现是org.apache.hadoop.hbase.client.BufferedMutatorImpl
  BufferedMutator mutator = conn.getBufferedMutator(params);
){         
  List<Put> putLists = new ArrayList<Put>();    
  for(int count=0;count<100000;count++){    
    Put put = new Put(rowkey.getBytes());    
    put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());    
    put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());    
    put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());    
    put.setDurability(Durability.SKIP_WAL);  
    putLists.add(put);    
        
    if(putLists.size()==bestBathPutSize){    
      //达到最佳大小值了,马上提交一把    
        mutator.mutate(putLists);   
        mutator.flush();
        putLists.clear();
    }    
  }    
  //剩下的未提交数据,最后做一次提交       
  mutator.mutate(putLists);   
  mutator.flush();
}catch(IOException e) {
  LOG.info("exception while creating/destroying Connection or BufferedMutator", e);
} 

 

2.BufferedMutatorParams

    BufferedMutatorParams主要是收集构造BufferedMutator对象的参数信息,这些参数包括hbase数据表名、hbase客户端缓冲区、hbase rowkey最大所占空间、线程池和监听hbase操作的回调监听器(比如监听hbase写入失败)

package org.apache.hadoop.hbase.client;

import java.util.concurrent.ExecutorService;

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;

/**
 * 构造BufferedMutator对象的类BufferedMutatorParams
 */
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class BufferedMutatorParams {

  static final int UNSET = -1;

  private final TableName tableName;//hbase数据表
  private long writeBufferSize = UNSET;//hbase客户端缓冲区
  private int maxKeyValueSize = UNSET;//hbase rowkey最大所占空间
  private ExecutorService pool = null;//线程池
  private BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {//监听hbase操作的回调监听器,比如监听hbase写入失败
    @Override
    public void onException(RetriesExhaustedWithDetailsException exception,
        BufferedMutator bufferedMutator)
        throws RetriesExhaustedWithDetailsException {
      throw exception;
    }
  };

  public BufferedMutatorParams(TableName tableName) {//构造方法
    this.tableName = tableName;
  }

  public TableName getTableName() {//获取表名
    return tableName;
  }

  public long getWriteBufferSize() {//获取写缓冲区大小
    return writeBufferSize;
  }

  /**
   * 重写缓冲区设置函数
   */
  public BufferedMutatorParams writeBufferSize(long writeBufferSize) {
    this.writeBufferSize = writeBufferSize;
    return this;
  }

  public int getMaxKeyValueSize() {//获取rowkey所占空间
    return maxKeyValueSize;
  }

  /**
   * 重写设置rowkey所占空间的函数
   */
  public BufferedMutatorParams maxKeyValueSize(int maxKeyValueSize) {
    this.maxKeyValueSize = maxKeyValueSize;
    return this;
  }

  public ExecutorService getPool() {//获取线程池
    return pool;
  }
  
  public BufferedMutatorParams pool(ExecutorService pool) {//构造函数
    this.pool = pool;
    return this;
  }

  public BufferedMutator.ExceptionListener getListener() {//获取监听器
    return listener;
  }
  
  public BufferedMutatorParams listener(BufferedMutator.ExceptionListener listener) {//构造函数
    this.listener = listener;
    return this;
  }
}

3.BufferedMutator

    BufferedMutator是一个接口,主要定义了一些抽象方法:

public interface BufferedMutator extends Closeable {
  TableName getName();//获取表名
  Configuration getConfiguration();//获取hadoop配置对象Configuration
  void mutate(Mutation mutation) throws IOException;//操作缓冲区
  void mutate(List<? extends Mutation> mutations) throws IOException;//批量操作缓冲区
  @Override
  void close() throws IOException;//实现Closeable接口,这样可以利用JDK1.7新特性不写finally就可以关闭对象
  void flush() throws IOException;//想hbase服务端提交数据请求
  long getWriteBufferSize();//获取写缓冲区大小
  @InterfaceAudience.Public
  @InterfaceStability.Evolving
  interface ExceptionListener {//监听器
    public void onException(RetriesExhaustedWithDetailsException exception,
        BufferedMutator mutator) throws RetriesExhaustedWithDetailsException;
  }
}

4.BufferedMutatorImpl

package org.apache.hadoop.hbase.client;

import com.google.common.annotations.VisibleForTesting;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;
import org.apache.hadoop.hbase.ipc.RpcControllerFactory;

import java.io.IOException;
import java.io.InterruptedIOException;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

/**
 * hbase1.0.0才开始使用BufferedMutatorImpl
 * 主要用于在多线程中操作同一个数据表
 * 需要注意的是多线程中共享一个BufferedMutator对象,如果某个线程中出错,其他线程也会出错
 */
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class BufferedMutatorImpl implements BufferedMutator {

  private static final Log LOG = LogFactory.getLog(BufferedMutatorImpl.class);
  
  private final ExceptionListener listener;//hbase客户端每次操作的监听回调对象

  protected ClusterConnection connection; //持有的链接
  private final TableName tableName;//hbase数据表
  private volatile Configuration conf;//hadoop配置类Configuration
  @VisibleForTesting
  final ConcurrentLinkedQueue<Mutation> writeAsyncBuffer = new ConcurrentLinkedQueue<Mutation>();//hbase缓冲区队列
  @VisibleForTesting
  AtomicLong currentWriteBufferSize = new AtomicLong(0);//线程安全的长整型值,主要累计当前在缓冲区中数据所占空间大小

  private long writeBufferSize;//hbase客户端缓冲区大小
  private final int maxKeyValueSize;//hbase客户端rowkey所占最大空间
  private boolean closed = false;//hbase客户端是否已经被关闭
  private final ExecutorService pool;//hbase客户端使用的线程池

  @VisibleForTesting
  protected AsyncProcess ap; //hbase客户端异步操作对象

  BufferedMutatorImpl(ClusterConnection conn, RpcRetryingCallerFactory rpcCallerFactory,
      RpcControllerFactory rpcFactory, BufferedMutatorParams params) {
    if (conn == null || conn.isClosed()) {
      throw new IllegalArgumentException("Connection is null or closed.");
    }

    this.tableName = params.getTableName();
    this.connection = conn;
    this.conf = connection.getConfiguration();
    this.pool = params.getPool();
    this.listener = params.getListener();

    //基于传入的conf构建自己的属性ConnectionConfiguration,客户端没有设置的配置会自动使用默认值
    ConnectionConfiguration tableConf = new ConnectionConfiguration(conf);
    //设置缓冲区大小
    this.writeBufferSize = params.getWriteBufferSize() != BufferedMutatorParams.UNSET ? params.getWriteBufferSize() : tableConf.getWriteBufferSize();
    //设置rowkey最大所占空间
    this.maxKeyValueSize = params.getMaxKeyValueSize() != BufferedMutatorParams.UNSET ? params.getMaxKeyValueSize() : tableConf.getMaxKeyValueSize();

    //hbase客户端异步操作对象
    ap = new AsyncProcess(connection, conf, pool, rpcCallerFactory, true, rpcFactory);
  }

  @Override
  public TableName getName() {//获取表名
    return tableName;
  }

  @Override
  public Configuration getConfiguration() {//获取hadoop配置对象Configuration,这里是客户端传入的conf
    return conf;
  }

  @Override
  public void mutate(Mutation m) throws InterruptedIOException,
      RetriesExhaustedWithDetailsException {//操作缓冲区
    mutate(Arrays.asList(m));
  }

  @Override
  public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException {  
    //如果BufferedMutatorImpl已经关闭,直接退出返回  
    if (closed) {  
      throw new IllegalStateException("Cannot put when the BufferedMutator is closed.");  
    }  
  
    //这里先不断循环累计提交的List<Put>记录所占的空间,放置到toAddSize  
    long toAddSize = 0;  
    for (Mutation m : ms) {  
      if (m instanceof Put) {  
        validatePut((Put) m);  
      }  
      toAddSize += m.heapSize();  
    }  
  
    // This behavior is highly non-intuitive... it does not protect us against  
    // 94-incompatible behavior, which is a timing issue because hasError, the below code  
    // and setter of hasError are not synchronized. Perhaps it should be removed.  
    if (ap.hasError()) {  
      //设置BufferedMutatorImpl当前记录的提交记录所占空间值为toAddSize  
      currentWriteBufferSize.addAndGet(toAddSize);  
      //把提交的记录List<Put>放置到缓存对象writeAsyncBuffer,在为提交完成前先不进行清理  
      writeAsyncBuffer.addAll(ms);  
      //这里当捕获到异常时候,再进行异常前的一次数据提交  
      backgroundFlushCommits(true);  
    } else {  
      //设置BufferedMutatorImpl当前记录的提交记录所占空间值为toAddSize  
      currentWriteBufferSize.addAndGet(toAddSize);  
      //把提交的记录List<Put>放置到缓存对象writeAsyncBuffer,在为提交完成前先不进行清理  
      writeAsyncBuffer.addAll(ms);  
    }  
  
    // Now try and queue what needs to be queued.  
    // 如果当前提交的List<Put>记录所占空间大于hbase.client.write.buffer设置的值,默认2MB,那么就马上调用backgroundFlushCommits方法  
    // 如果小于hbase.client.write.buffer设置的值,那么就直接退出,啥也不做  
    while (currentWriteBufferSize.get() > writeBufferSize) {  
      backgroundFlushCommits(false);  
    }  
  }  

  // 校验Put
  public void validatePut(final Put put) throws IllegalArgumentException {
    HTable.validatePut(put, maxKeyValueSize);
  }

  @Override
  public synchronized void close() throws IOException {
    try {
      if (this.closed) {//如果已经关闭了,直接返回
        return;
      }
      
      //关闭前做最后一次提交
      backgroundFlushCommits(true);
      this.pool.shutdown();//关闭线程池
      boolean terminated;
      int loopCnt = 0;
      do {
        // wait until the pool has terminated
        terminated = this.pool.awaitTermination(60, TimeUnit.SECONDS);
        loopCnt += 1;
        if (loopCnt >= 10) {
          LOG.warn("close() failed to terminate pool after 10 minutes. Abandoning pool.");
          break;
        }
      } while (!terminated);

    } catch (InterruptedException e) {
      LOG.warn("waitForTermination interrupted");

    } finally {
      this.closed = true;
    }
  }

  @Override
  public synchronized void flush() throws InterruptedIOException, RetriesExhaustedWithDetailsException {
    //主动调用flush提交数据到hbase服务端
    backgroundFlushCommits(true);
  }

  private void backgroundFlushCommits(boolean synchronous) throws InterruptedIOException, RetriesExhaustedWithDetailsException {  
    LinkedList<Mutation> buffer = new LinkedList<>();  
    // Keep track of the size so that this thread doesn't spin forever  
    long dequeuedSize = 0;  
  
    try {  
      //分析所有提交的List<Put>,Put是Mutation的实现  
      Mutation m;  
      //如果(hbase.client.write.buffer <= 0 || 0 < (whbase.client.write.buffer * 2) || synchronous)&& writeAsyncBuffer里仍然有Mutation对象  
      //那么就不断计算所占空间大小dequeuedSize  
      //currentWriteBufferSize的大小则递减  
      while ((writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous) && (m = writeAsyncBuffer.poll()) != null) {  
        buffer.add(m);  
        long size = m.heapSize();  
        dequeuedSize += size;  
        currentWriteBufferSize.addAndGet(-size);  
      }  
  
      //backgroundFlushCommits(false)时候,当List<Put>,这里不会进入  
      if (!synchronous && dequeuedSize == 0) {  
        return;  
      }  
  
      //backgroundFlushCommits(false)时候,这里会进入,并且不会等待结果返回  
      if (!synchronous) {  
        //不会等待结果返回  
        ap.submit(tableName, buffer, true, null, false);  
        if (ap.hasError()) {  
          LOG.debug(tableName + ": One or more of the operations have failed -"  
              + " waiting for all operation in progress to finish (successfully or not)");  
        }  
      }  
      //backgroundFlushCommits(true)时候,这里会进入,并且会等待结果返回  
      if (synchronous || ap.hasError()) {  
        while (!buffer.isEmpty()) {  
          ap.submit(tableName, buffer, true, null, false);  
        }  
        //会等待结果返回  
        RetriesExhaustedWithDetailsException error = ap.waitForAllPreviousOpsAndReset(null);  
        if (error != null) {  
          if (listener == null) {  
            throw error;  
          } else {  
            this.listener.onException(error, this);  
          }  
        }  
      }  
    } finally {  
      //如果还有数据,那么给到外面最后提交  
      for (Mutation mut : buffer) {  
        long size = mut.heapSize();  
        currentWriteBufferSize.addAndGet(size);  
        dequeuedSize -= size;  
        writeAsyncBuffer.add(mut);  
      }  
    }  
  } 

  /**
   * 设置hbase客户端缓冲区所占空间大小
   */
  @Deprecated
  public void setWriteBufferSize(long writeBufferSize) throws RetriesExhaustedWithDetailsException,
      InterruptedIOException {
    this.writeBufferSize = writeBufferSize;
    if (currentWriteBufferSize.get() > writeBufferSize) {
      flush();
    }
  }

  /**
   * 获取写缓冲区大小
   */
  @Override
  public long getWriteBufferSize() {
    return this.writeBufferSize;
  }


  @Deprecated
  public List<Row> getWriteBuffer() {
    return Arrays.asList(writeAsyncBuffer.toArray(new Row[0]));
  }
}

  

5.BufferedMutatorExample

    在hbase的源代码模块hbase-examples里提供了使用hbase客户端的例子,这个java类是BufferedMutatorExample,从这个类里面告诉了我们另外一种操作hbase客户端的实现,其代码如下:

 

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.BufferedMutatorParams;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

/**
 * An example of using the {@link BufferedMutator} interface.
 */
public class BufferedMutatorExample extends Configured implements Tool {

	private static final Log LOG = LogFactory.getLog(BufferedMutatorExample.class);

	private static final int POOL_SIZE = 10;// 线程池大小
	private static final int TASK_COUNT = 100;// 任务数
	private static final TableName TABLE = TableName.valueOf("foo");// hbase数据表foo
	private static final byte[] FAMILY = Bytes.toBytes("f");// hbase数据表foo的列簇f

	/**
	 * 重写Tool.run(String [] args)方法,传入的是main函数的参数String[] args
	 */
	@Override
    public int run(String[] args) throws InterruptedException, ExecutionException, TimeoutException {

        /** 一个异步回调监听器,在hbase write失败的时候触发. */
        final BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {
            @Override
            public void onException(RetriesExhaustedWithDetailsException e, BufferedMutator mutator) {
                for (int i = 0; i < e.getNumExceptions(); i++) {
                    LOG.info("Failed to sent put " + e.getRow(i) + ".");
                }
            }
        };
        /** 
         * BufferedMutator的构造参数对象BufferedMutatorParams. 
         * BufferedMutatorParams参数如下:
         *   			TableName tableName
         *   			long writeBufferSize
         *   			int maxKeyValueSize
         *  			 ExecutorService pool
         *  			 BufferedMutator.ExceptionListener listener
         *  这里只设置了属性tableName和listener
         * */
        BufferedMutatorParams params = new BufferedMutatorParams(TABLE).listener(listener);
        
        /**
         * step 1: 创建一个连接Connection和BufferedMutator对象,供线程池中的所有线程共享使用
         *              这里利用了jdk1.7里的新特性try(必须实现java.io.Closeable的对象){}catch (Exception e) {},
         *              在调用完毕后会主动调用(必须实现java.io.Closeable的对象)的close()方法,
         *              这里也即默认实现了finally的功能,相当于执行了
         *              finally{
         *              	conn.close();
         *              	mutator.close();
         *              }
         */
        try (
                final Connection conn = ConnectionFactory.createConnection(getConf());
                final BufferedMutator mutator = conn.getBufferedMutator(params)
        ) {
            /** 操作BufferedTable对象的工作线程池,大小为10 */
            final ExecutorService workerPool = Executors.newFixedThreadPool(POOL_SIZE);
            List<Future<Void>> futures = new ArrayList<>(TASK_COUNT);

            /** 不断创建任务,放入线程池执行,任务数为100个 */
            for (int i = 0; i < TASK_COUNT; i++) {
                futures.add(workerPool.submit(new Callable<Void>() {
                    @Override
                    public Void call() throws Exception {
                        /** 
                         * step 2: 所有任务都共同向BufferedMutator的缓冲区发送数据,
                         *              所有任务共享BufferedMutator的缓冲区(hbase.client.write.buffer),
                         *              所有任务共享回调监听器listener和线程池
                         *  */

                        /** 
                         * 这里构造Put对象
                         *  */
                        Put p = new Put(Bytes.toBytes("someRow"));
                        p.addColumn(FAMILY, Bytes.toBytes("someQualifier"), Bytes.toBytes("some value"));
                        /** 
                         * 添加数据到BufferedMutator的缓冲区(hbase.client.write.buffer),
                         * 这里不会立即提交数据到hbase服务端,只会在缓冲区大小大于hbase.client.write.buffer时候才会主动提交数据到服务端
                         *  */
                        mutator.mutate(p);
                        
                        /** 
                         * TODO
                         * 这里你可以在退出本任务前自己主动调用mutator.flush()提交数据到hbase服务端
                         * mutator.flush();
                         *  */
                        return null;
                    }
                }));
            }

            /**
             * step 3: 遍历每个回调任务的Future,如果未执行完,每个Future等待5分钟
             */
            for (Future<Void> f : futures) {
                f.get(5, TimeUnit.MINUTES);
            }
            /**
             * 最后关闭线程池
             */
            workerPool.shutdown();
        } catch (IOException e) {
            // exception while creating/destroying Connection or BufferedMutator
            LOG.info("exception while creating/destroying Connection or BufferedMutator", e);
        }
        /**
         * 这里没有finally代码,原因是前面用了jdk1.7里的新特性try(必须实现java.io.Closeable的对象){}catch (Exception e) {},
         * 在调用完毕后会主动调用(必须实现java.io.Closeable的对象)的close()方法,也即会调用conn.close(),mutator.close()
         */
        return 0;
    }

	public static void main(String[] args) throws Exception {
		//调用工具类ToolRunner执行实现了接口Tool的对象BufferedMutatorExample的run方法,同时会把String[] args传入BufferedMutatorExample的run方法
		ToolRunner.run(new BufferedMutatorExample(), args);
	}
}

 

6.源码收获

  •     BufferedMutator完全可以用于操作hbase客户端;
  •     BufferedMutator可以供多线程共享使用;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值