HBase HFile BulkLoad

最新推荐文章于 2021-05-23 10:00:28 发布

wdier

最新推荐文章于 2021-05-23 10:00:28 发布

阅读量657

点赞数

分类专栏： HBase

HBase 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

原文：http://shitouer.cn/2013/02/hbase-hfile-bulk-load/

一、这种方式有很多的优点：

1. 如果我们一次性入库Hbase巨量数据，处理速度慢不说，还特别占用Region资源，一个比较高效便捷的方法就是使用 “Bulk Loading”方法，即HBase提供的HFileOutputFormat类。

2. 它是利用hbase的数据信息按照特定格式存储在hdfs内这一原理，直接生成这种hdfs内存储的数据格式文件，然后上传至合适位置，即完成巨量数据快速入库的办法。配合mapreduce完成，高效便捷，而且不占用region资源，增添负载。

二、这种方式也有很大的限制：

1. 仅适合初次数据导入，即表内数据为空，或者每次入库表内都无数据的情况。

2. HBase集群与Hadoop集群为同一集群，即HBase所基于的HDFS为生成HFile的MR的集群(额，咋表述~~~)

三、接下来一个demo，简单介绍整个过程。

1. 生成HFile部分

 
         package  
         zl.hbase.mr; 
        
         import  
         java.io.IOException; 
        
         import  
         org.apache.hadoop.conf.Configuration; 
        
         import  
         org.apache.hadoop.fs.Path; 
        
         import  
         org.apache.hadoop.hbase.KeyValue; 
        
         import  
         org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
        
         import  
         org.apache.hadoop.hbase.mapreduce.HFileOutputFormat; 
        
         import  
         org.apache.hadoop.hbase.mapreduce.KeyValueSortReducer; 
        
         import  
         org.apache.hadoop.hbase.mapreduce.SimpleTotalOrderPartitioner; 
        
         import  
         org.apache.hadoop.hbase.util.Bytes; 
        
         import  
         org.apache.hadoop.io.LongWritable; 
        
         import  
         org.apache.hadoop.io.Text; 
        
         import  
         org.apache.hadoop.mapreduce.Job; 
        
         import  
         org.apache.hadoop.mapreduce.Mapper; 
        
         import  
         org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
        
         import  
         org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
        
         import  
         org.apache.hadoop.util.GenericOptionsParser; 
        
         import  
         zl.hbase.util.ConnectionUtil; 
        
         public  
         class  
         HFileGenerator { 
        
         public  
         static  
         class  
         HFileMapper  
         extends 
        
         Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue> { 
        
         @Override 
        
         protected  
         void  
         map(LongWritable key, Text value, Context context) 
        
         throws  
         IOException, InterruptedException { 
        
         String line = value.toString(); 
        
         String[] items = line.split( 
         "," 
         , - 
         1 
         ); 
        
         ImmutableBytesWritable rowkey =  
         new  
         ImmutableBytesWritable( 
        
         items[ 
         0 
         ].getBytes()); 
        
         KeyValue kv =  
         new  
         KeyValue(Bytes.toBytes(items[ 
         0 
         ]), 
        
         Bytes.toBytes(items[ 
         1 
         ]), Bytes.toBytes(items[ 
         2 
         ]), 
        
         System.currentTimeMillis(), Bytes.toBytes(items[ 
         3 
         ])); 
        
         if  
         ( 
         null  
         != kv) { 
        
         context.write(rowkey, kv); 
        
         } 
        
         } 
        
         } 
        
         public  
         static  
         void  
         main(String[] args)  
         throws  
         IOException, 
        
         InterruptedException, ClassNotFoundException { 
        
         Configuration conf =  
         new  
         Configuration(); 
        
         String[] dfsArgs =  
         new  
         GenericOptionsParser(conf, args) 
        
         .getRemainingArgs(); 
        
         Job job =  
         new  
         Job(conf,  
         "HFile bulk load test" 
         ); 
        
         job.setJarByClass(HFileGenerator. 
         class 
         ); 
        
         job.setMapperClass(HFileMapper. 
         class 
         ); 
        
         job.setReducerClass(KeyValueSortReducer. 
         class 
         ); 
        
         job.setMapOutputKeyClass(ImmutableBytesWritable. 
         class 
         ); 
        
         job.setMapOutputValueClass(Text. 
         class 
         ); 
        
         job.setPartitionerClass(SimpleTotalOrderPartitioner. 
         class 
         ); 
        
         FileInputFormat.addInputPath(job,  
         new  
         Path(dfsArgs[ 
         0 
         ])); 
        
         FileOutputFormat.setOutputPath(job,  
         new  
         Path(dfsArgs[ 
         1 
         ])); 
        
         HFileOutputFormat.configureIncrementalLoad(job, 
        
         ConnectionUtil.getTable()); 
        
         System.exit(job.waitForCompletion( 
         true 
         ) ?  
         0  
         :  
         1 
         ); 
        
         } 
        
         }

生成HFile程序说明：

①. 最终输出结果，无论是map还是reduce，输出部分key和value的类型必须是： < ImmutableBytesWritable, KeyValue>或者< ImmutableBytesWritable, Put>。

②. 最终输出部分，Value类型是KeyValue 或Put，对应的Sorter分别是KeyValueSortReducer或PutSortReducer。

③. MR例子中job.setOutputFormatClass(HFileOutputFormat.class); HFileOutputFormat只适合一次对单列族组织成HFile文件。

④. MR例子中HFileOutputFormat.configureIncrementalLoad(job, table);自动对job进行配置。SimpleTotalOrderPartitioner是需要先对key进行整体排序，然后划分到每个reduce中，保证每一个reducer中的的key最小最大值区间范围，是不会有交集的。因为入库到HBase的时候，作为一个整体的Region，key是绝对有序的。

⑤. MR例子中最后生成HFile存储在HDFS上，输出路径下的子目录是各个列族。如果对HFile进行入库HBase，相当于move HFile到HBase的Region中，HFile子目录的列族内容没有了。

2. HFile入库到HBase

 
         package  
         zl.hbase.bulkload; 
        
         import  
         org.apache.hadoop.fs.Path; 
        
         import  
         org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles; 
        
         import  
         org.apache.hadoop.util.GenericOptionsParser; 
        
         import  
         zl.hbase.util.ConnectionUtil; 
        
         public  
         class  
         HFileLoader { 
        
         public  
         static  
         void  
         main(String[] args)  
         throws  
         Exception { 
        
         String[] dfsArgs =  
         new  
         GenericOptionsParser( 
        
         ConnectionUtil.getConfiguration(), args).getRemainingArgs(); 
        
         LoadIncrementalHFiles loader =  
         new  
         LoadIncrementalHFiles( 
        
         ConnectionUtil.getConfiguration()); 
        
         loader.doBulkLoad( 
         new  
         Path(dfsArgs[ 
         0 
         ]), ConnectionUtil.getTable()); 
        
         } 
        
         }

通过HBase中 LoadIncrementalHFiles的doBulkLoad方法，对生成的HFile文件入库

我修改了一下如下：

[java]view plaincopy 
    
 import java.io.IOException;  
   
 import org.apache.hadoop.conf.Configuration;  
 import org.apache.hadoop.fs.Path;  
 import org.apache.hadoop.hbase.HBaseConfiguration;  
 import org.apache.hadoop.hbase.KeyValue;  
 import org.apache.hadoop.hbase.client.HTable;  
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;  
 import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;  
 import org.apache.hadoop.hbase.mapreduce.SimpleTotalOrderPartitioner;  
 import org.apache.hadoop.hbase.util.Bytes;  
 import org.apache.hadoop.io.LongWritable;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Job;  
 import org.apache.hadoop.mapreduce.Mapper;  
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
 import org.apache.hadoop.util.GenericOptionsParser;  
   
 public class HFileGenerator {  
   
         public static class HFileMapper extends  
                         Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue> {  
   
                 ImmutableBytesWritable tableKey = new ImmutableBytesWritable();  
   
                 @Override  
                 protected void map(LongWritable key, Text value, Context context)  
                                 throws IOException, InterruptedException {  
   
                         String line = value.toString();  
                         String[] items = line.split(",", -1);  
                         tableKey.set(Bytes.toBytes(items[0]));  
                         KeyValue kv = new KeyValue(Bytes.toBytes(items[0]),  
                                         Bytes.toBytes(items[1]), Bytes.toBytes(items[2]),  
                                         System.currentTimeMillis(), Bytes.toBytes(items[3]));  
   
                         if (kv != null) {  
                                 context.write(tableKey, kv);  
                         }  
   
                 }  
   
         }  
   
         /** 
          * * @param args * @throws IOException 
          * */  
         public static void main(String[] args) throws Exception {  
   
                 Configuration conf = new Configuration();  
                 String[] otherArgs = new GenericOptionsParser(conf, args)  
                                 .getRemainingArgs();  
                 if (otherArgs.length != 2) {  
                         System.err.println("Usage: " + HFileGenerator.class.getName()  
                                         + " <in> <out>");  
                         System.exit(2);  
                 }  
                 Job job = new Job(conf, "HFile bulk load test");  
                 job.setJarByClass(HFileGenerator.class);  
                 job.setMapperClass(HFileMapper.class);  
 //              job.setReducerClass(KeyValueSortReducer.class);  
 //              job.setOutputKeyClass(ImmutableBytesWritable.class);  
 //              job.setOutputValueClass(Text.class);  
 //              job.setPartitionerClass(SimpleTotalOrderPartitioner.class);  
                 FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  
                 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  
   
                 Configuration hbaseconfig = null;  
                 HTable table;  
                 hbaseconfig = HBaseConfiguration.create();  
                 table = new HTable(hbaseconfig, "member3");  
                 HFileOutputFormat.configureIncrementalLoad(job, table);  
                 job.setPartitionerClass(SimpleTotalOrderPartitioner.class);  
                 System.exit(job.waitForCompletion(true) ? 0 : 1);  
   
         }  
 }  

// job.setReducerClass(KeyValueSortReducer.class);// job.setOutputKeyClass(ImmutableBytesWritable.class);// job.setOutputValueClass(Text.class);// job.setPartitionerClass(SimpleTotalOrderPartitioner.class);

这几句可以不用写，因为在HFileOutputFormat.configureIncrementalLoad(job, table);会设置它们。

2.HFileLoader的修改

[java]view plaincopy 
    
 import org.apache.hadoop.conf.Configuration;  
 import org.apache.hadoop.fs.Path;  
 import org.apache.hadoop.hbase.HBaseConfiguration;  
 import org.apache.hadoop.hbase.client.HTable;  
 import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;  
   
 public class HFileLoader {  
   
         public static void main(String[] args) throws Exception {  
                 Configuration hbaseconfig = null;  
                 HTable table;  
                 hbaseconfig = HBaseConfiguration.create();  
                 table = new HTable(hbaseconfig, "member3");  
   
                 LoadIncrementalHFiles lf = new LoadIncrementalHFiles(hbaseconfig);  
                 lf.doBulkLoad(new Path("hdfs://master24:9000/user/hadoop/hbasemapred/out"), table);  
   
         }  
 }  

Path用前一步mapreduce的输出目录,写全路径

wdier

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
HBase HFile BulkLoad

原文：http://shitouer.cn/2013/02/hbase-hfile-bulk-load/一、这种方式有很多的优点：1. 如果我们一次性入库Hbase巨量数据，处理速度慢不说，还特别占用Region资源，一个比较高效便捷的方法就是使用 “Bulk Loading”方法，即HBase提供的HFileOutputFormat类。2. 它是利用hbase的数据信息按照特定格式
复制链接

扫一扫