HBase BulkLoad批量导数

使用 bulk load 批量导数到 hbase

操作步骤

HBase Version: 2.2.7
Hive Version: 3.1.0

  1. 在hive shell里添加habse jar包

    add jar /opt/apache/hive/lib/hive-hbase-handler-3.1.0.jar;
    add jar /opt/apache/hive/lib/hbase-client-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-common-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-common-2.0.0-alpha4-tests.jar;
    add jar /opt/apache/hive/lib/hbase-hadoop2-compat-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-hadoop2-compat-2.0.0-alpha4-tests.jar;
    add jar /opt/apache/hive/lib/hbase-hadoop-compat-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-http-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-mapreduce-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-metrics-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-metrics-api-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-prefix-tree-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-procedure-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-protocol-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-protocol-shaded-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-replication-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-server-2.0.0-alpha4.jar;
    add jar /opt/apache/hive/lib/hbase-shaded-miscellaneous-1.0.1.jar;
    add jar /opt/apache/hive/lib/hbase-shaded-netty-1.0.1.jar;
    add jar /opt/apache/hive/lib/hbase-shaded-protobuf-1.0.1.jar;
    
  2. 创建hive表,导入测试数据

    use zxl_test;
    
    drop table hive_table;
    create table hive_table(key int, name string,age int, create_time string)
    stored as orc;
    
    insert into hive_table (key, name, age, create_time) values
    (1, 'a', 18, from_utc_timestamp(CURRENT_TIMESTAMP,'GMT+8')),
    (2, 'b', 19, from_utc_timestamp(CURRENT_TIMESTAMP,'GMT+8')),
    (3, 'c', 20, from_utc_timestamp(CURRENT_TIMESTAMP,'GMT+8'));
    
  3. 创建生成hfile的hive表

    drop table hive_hfile_table;
    create table hive_hfile_table(key int, name string, age int, create_time string)
    stored as
    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat'
    TBLPROPERTIES ('hfile.family.path' = '/tmp/hive_hfile_table/cf');
    
  4. 向hive_hfile表插入数据

    insert overwrite table hive_hfile_table select * from hive_table;
    
  5. 查看hfile是否生成

    -- 查看hfile文件
    hdfs dfs -ls /tmp/hive_hfile_table/cf
    
    -- 查看hfile文件具体信息
    hbase hfile -v -p -m -f hdfs://bigbigworld/tmp/hive_hfile_table/cf/xxx
    
  6. hbase建表

    hbase shell
    create 'hbase_table', {  NAME =>'cf',COMPRESSION => 'SNAPPY' }
    
  7. BulkLoad加载hfile到HBase

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://bigbigworld/tmp/hive_hfile_table hbase_table
    
  8. 验证HBase数据

    scan 'hbase_table'
    
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值