hbase 开启缓冲和不开启缓冲比较

最新推荐文章于 2023-05-06 16:46:45 发布

qqpy789

最新推荐文章于 2023-05-06 16:46:45 发布

阅读量534

点赞数

本文链接：https://blog.csdn.net/qqpy789/article/details/78343179

版权

hbase 中的put 一个put相当于是一个rpc,如果循环的次数过多则绝大部分时间都消耗在rpc的网路传输过程中而不是用于传输数据的时间占比高

如果开启个缓冲一次性提交批量数据则会提高效率

得了话就说那么多,直接上代码

package com.hit.test;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

/**
 * Created by zh on 2017/10/25.
 */
public class TestPut {

    private byte[] f = Bytes.toBytes("mycf");
    private byte[] q = Bytes.toBytes("q1");
    private byte[] q2 = Bytes.toBytes("q2");


    public static void main(String[] args) throws Exception {
        TestPut t = new TestPut();
        HTableInterface mytable = HbaseUtils.getTable("sku:mytable");
        t.put1000(mytable);
        t.put100Cache(mytable);
    }

    public  void put1000( HTableInterface mytable)throws Exception{
        Long l = System.currentTimeMillis();

        for (int i =0;i<100000;i++){
            Put put = new Put(Bytes.toBytes("row1_"+i));
            put.add(f,q,Bytes.toBytes("row1_q1_value1_"+i));
            put.add(f,q2,Bytes.toBytes("row1_q2_value1_"+i));
            mytable.put(put);
        }

        System.out.println("put1000消耗时间为 "+ (System.currentTimeMillis() - l)+"ms");
    }

    public void put100Cache(HTableInterface mytable)throws Exception{
        Long l = System.currentTimeMillis();
        mytable.setAutoFlush(false);
        for (int i =0;i<100000;i++){
            Put put = new Put(Bytes.toBytes("row1_"+i));
            put.add(f,q,Bytes.toBytes("row1_q1_value1_"+i));
            put.add(f,q2,Bytes.toBytes("row1_q2_value1_"+i));
            mytable.put(put);
        }
        System.out.println("put100Cache消耗时间为 "+ (System.currentTimeMillis() - l)+"ms");

    }
}

附上测试的结果

put1000消耗时间为 164039ms
put100Cache消耗时间为 1895ms

差距简直大到惊人

可以手动去强制写缓冲不过不推荐推荐直接设置缓冲池的大小让hbase客户端自动去提交数据

不过有一点得注意一下我写的这个代码也没有注意到最后得进行一个手动强制flash缓冲区域不然会出现需要等待下次缓冲区域满了才会进行刷写hbase,出现数据丢失的情况