spring-hadoop之操作hbase

最新推荐文章于 2020-01-20 18:18:51 发布

happyAliceYu

最新推荐文章于 2020-01-20 18:18:51 发布

阅读量5k

点赞数

分类专栏： Java 框架

本文链接：https://blog.csdn.net/happyAliceYu/article/details/62218257

版权

Java 框架专栏收录该内容

14 篇文章 0 订阅

订阅专栏

Srping对于属于java web技术的程序员都不会陌生，jdbcTemplate更是用的熟之又熟，下面我们来认识一下Spring大家庭的新成员：Spring-data-hadoop项目。Spring-hadoop这个项目应该是在 Spring Data 项目的一部分（Srping data其余还包括把Spring和JDBC，REST，主流的NoSQL结合起来了）。其实再一想，Spring和Hadoop结合会发生什么呢，其实就是把Hadoop组件的配置，任务部署之类的东西都统一到Spring的bean管理里去了。

1. pom.xml中引入maven依赖

<groupId>org.springframework.data</groupId>

<artifactId>spring-data-Hadoop</artifactId>

<version>1.0.1.RELEASE</version>

</dependency>

<groupId>org.apache.Hbase</groupId>

<artifactId>hbase</artifactId>

</dependency>

2. Spring-hbase.xml配置文件配置：

<?xml version="1.0" encoding="UTF-8"?>

<beans

xmlns="http://www.springframework.org/schema/beans"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:hdp="http://www.springframework.org/schema/hadoop"

xsi:schemaLocation="

http://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans-3.0.xsd

http://www.springframework.org/schema/hadoophttp://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

<hdp:configuration resources="classpath:/hbase-site.xml" />
<hdp:hbase-configuration configuration-ref="hadoopConfiguration" />

<bean id="htemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
<property name="configuration" ref="hbaseConfiguration">
</property>
<property name="encoding" value="UTF-8"></property>
</bean>
</beans>

hbaseConfiguration其实就是指的<hdp:hbase-configuration/>配置的信息

4、将hbase-site.xml配置文件拷贝到src目录下，参考内容如下：

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <configuration>
 3   <property>
 4     <name>hbase.rootdir</name>
 5     <value>hdfs://nameservice1/hbase</value>
 6   </property>
 7   <property>
 8     <name>hbase.client.write.buffer</name>
 9     <value>62914560</value>
10   </property>
11   <property>
12     <name>hbase.client.pause</name>
13     <value>1000</value>
14   </property>
15   <property>
16     <name>hbase.client.retries.number</name>
17     <value>10</value>
18   </property>
19   <property>
20     <name>hbase.client.scanner.caching</name>
21     <value>1</value>
22   </property>
23   <property>
24     <name>hbase.client.keyvalue.maxsize</name>
25     <value>62914560</value>
26   </property>
27   <property>
28     <name>hbase.rpc.timeout</name>
29     <value>60000</value>
30   </property>
31   <property>
32     <name>hbase.security.authentication</name>
33     <value>simple</value>
34   </property>
35   <property>
36     <name>zookeeper.session.timeout</name>
37     <value>60000</value>
38   </property>
39   <property>
40     <name>zookeeper.znode.parent</name>
41     <value>/hbase</value>
42   </property>
43   <property>
44     <name>zookeeper.znode.rootserver</name>
45     <value>root-region-server</value>
46   </property>
47   <property>
48     <name>hbase.zookeeper.quorum</name>
49     <value>xinhong-hadoop-56,xinhong-hadoop-52,xinhong-hadoop-53</value>
50   </property>
51   <property>
52     <name>hbase.zookeeper.property.clientPort</name>
53     <value>2181</value>
54   </property>
55 </configuration>

5. 实例演示

public static void main(String[] args) {

ApplicationContext context = new ClassPathXmlApplicationContext(newString[] { "spring-beans-hbase.xml" });

BeanFactory factory = (BeanFactory) context;

HbaseTemplate htemplate = (HbaseTemplate) factory.getBean("htemplate");

String custom = "custom";

htemplate.get("wcm", "10461", newRowMapper<String>(){

@Override

public String mapRow(Result result, int rowNum) throws Exception {

// TODO Auto-generated methodstub

for(KeyValue kv :result.raw()){

String key = newString(kv.getQualifier());

String value = newString(kv.getValue());

System.out.println(key +"= "+Bytes.toString(value.getBytes()));

}

return null;

}

});

}

查看数据 get “wcm“, ”rowkey“ 得到一条数据

Hbase查询总结：

HBase只提供了行级索引，因此，要进行条件查询只有两种方式：

（1）.设计合适的行键（通过行键直接定位到数据所在的位置）；

（2）.通过Scan方式进行查询，Scan可设置其实行和结束行，把这个搜索限定在一个区域中进行；

Scan可以设置一个或多个Filter，来对行键、列族和列进行过滤，从而达到条件查询的目的。

Get数据的获取与上节Put数据插入一样，分为多种使用方式。

1、单行获取：get(Get get)

单行获取每次RPC请求值发送一个Get对象中的数据，因为Get对象初始化时需要输入行键，因此可以理解为一个Get对象就代表一行。一行中可以包含多个列簇或者多个列等信息

[html]view plain copy
public void get(String tableName,String rowKey,String family,String qualifier)  
    {  
        Configuration conf=init();  
        try {  
            //进行管理员获取  
            HBaseAdmin admin=new HBaseAdmin(conf);  
            if(!admin.tableExists(Bytes.toBytes(tableName)))  
            {  
                System.err.println("the table "+tableName+" is not exist");  
                admin.close();  
                System.exit(1);  
            }  
            admin.close();  
            //创建表连接  
            HTable table=new HTable(conf,TableName.valueOf(tableName));  
            //创建一个获取对象  
            Get get=new Get(Bytes.toBytes(rowKey));  
            //根据传入的值，进行获取判断  
            if(family!=null && qualifier!=null)  
            {  
                get.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));  
            }  
            else if(family != null && qualifier == null)  
            {  
                get.addFamily(Bytes.toBytes(family));  
            }  
            //获取数据  
            Result result=table.get(get);  
            KeyValue[] kvs=result.raw();  
            for(KeyValue kv:kvs)  
            {  
                System.out.println(Bytes.toString(kv.getRow()));  
                System.out.println(Bytes.toString(kv.getFamily()));  
                System.out.println(Bytes.toString(kv.getQualifier()));  
                System.out.println(Bytes.toString(kv.getValue()));  
            }  
        } catch (Exception e) {  
            // TODO: handle exception  
            e.printStackTrace();  
        }  
    }  

从上述代码中我们可以看到，Get实例使用 addColumn / addFamily 着两个函数想Get中添加搜索范围。如果没有添加，则表示将整行数据进行返回。如果添加列簇，则将指定的列簇中的所有列进行返回，如果指定列，则将制定的列进行返回。

2、获取多行： get(List<Get> list)

多行获取获取实质就是在代码中对List<Get>实例进行迭代，从而发送多次数据请求（即多个RPC请求与数据操作，一次请求包含一次RPC请求和一次数据传输）。

[java]view plain copy
public void getList(String tableName,String[] rows,String[] families,String[] qualifiers)  
    {  
        Configuration conf=init();  
        try {  
            //判断表是否存在  
            HBaseAdmin admin=new HBaseAdmin(conf);  
            if(!admin.tableExists(Bytes.toBytes(tableName)))  
            {  
                System.err.println("the table "+tableName+" is not exist");  
                admin.close();  
                System.exit(1);  
            }  
            //创建表连接  
            HTable table=new HTable(conf, Bytes.toBytes(tableName));  
            List<Get> gets=new ArrayList<>();  
            int length=rows.length;  
            for(int i=0;i<length;i++)  
            {  
                Get get=new Get(Bytes.toBytes(rows[i]));  
                get.addColumn(Bytes.toBytes(families[i]), Bytes.toBytes(qualifiers[i]));  
                gets.add(get);  
            }  
            //对结果进行递归输出  
            Result[] results=table.get(gets);  
            for(Result result:results)  
            {  
                KeyValue[] keyValues=result.raw();  
                for(KeyValue kv:keyValues)  
                {  
                    System.out.println(Bytes.toString(kv.getRow()));  
                    System.out.println(Bytes.toString(kv.getFamily()));  
                    System.out.println(Bytes.toString(kv.getQualifier()));  
                    System.out.println(Bytes.toString(kv.getValue()));  
                }  
            }  
        } catch (Exception e) {  
            // TODO: handle exception  
            e.printStackTrace();  
        }  
    }  

3、获取数据或者前一行：getRowOrBefore()

该函数是HTable类提供的一个借口。作为为：当参数中的行存在时，则将本行指定的列簇进行返回，如果不存在时，则返回表中存在的指定行的前一行的数据进行返回。

[java]view plain copy
public void getRowOrBefore(String tableName,String row,String family,String qualifier)  
    {  
        Configuration conf=init();  
        try {  
            HBaseAdmin admin=new HBaseAdmin(conf);  
            if(!admin.tableExists(tableName))  
            {  
                System.out.println("the table "+tableName+" is not exist");  
                admin.close();  
                System.exit(1);  
            }  
            //创建表连接  
            HTable table=new HTable(conf, tableName);  
            //执行函数  
            Result result=table.getRowOrBefore(Bytes.toBytes(row),Bytes.toBytes(family));  
            //进行循环  
            KeyValue[] keyValues=result.raw();  
            for(KeyValue kv: keyValues)  
            {  
                System.out.println(Bytes.toString(kv.getRow()));  
                System.out.println(Bytes.toString(kv.getFamily()));  
                System.out.println(Bytes.toString(kv.getQualifier()));  
                System.out.println(Bytes.toString(kv.getValue()));  
                System.out.println(Bytes.toString(kv.getKey()));  
            }  
            table.close();  
        } catch (Exception e) {  
            // TODO: handle exception  
        }  
    }  

注意：在函数中需要注意的是，所有行的row在HBase中的存储都是byte数组，其没有具体的类型，因此row-10 是小于 row-9。因此在进行比较是在第五位中1是小于9，HBase数据库则会认为row-10 是小于 row-9 的。如果指定顺序的话，则需要将数据的row的位数规定一致。则 row-9 应该更改为 row-09。通过这样的修改可以保证 row-09 是小于 row-10的。

4、结果显示：Result对象、KeyValue对象与Cell对象

（1）Result对象，在查询得到的结果，每一行数据会被作为一个Result对象，将数据存入到一个Result实例中。当我们需要获取一行数据时则需要获取该行数据所在的Result对象即可。该对象内部封装了一个KeyValue 对象数组。在0.98.4以前的本班。result类提供了 raw() 方法去获取整个result对象中的KeyValue数组。在0.98.4以后，则提供了一个新的节后： rowCells() 方法获取KeyValue对象，不过返回的是KeyValue 对象父类引用。

（2）KeyValue对象。该对象我们已经进行过介绍。因此这里我们只进行其使用的展示

[java]view plain copy
public void KVObject(String tableName,String row,String family,String qualifier)  
    {  
        Configuration conf=init();  
        try {  
            HBaseAdmin admin=new HBaseAdmin(conf);  
            if(!admin.tableExists(Bytes.toBytes(tableName)))  
            {  
                System.err.println("the table "+tableName+" is not exist");  
                admin.close();  
                System.exit(1);  
            }  
            admin.close();  
            //创建表连接  
            HTable table=new HTable(conf, tableName);  
            //查询一行并返回result对象  
            Get get=new Get(Bytes.toBytes(row));  
            Result result=table.get(get);  
            //进行循环  
            KeyValue[] keyValues=result.raw();  
            for(KeyValue kv: keyValues)  
            {  
                System.out.println(Bytes.toString(kv.getRow()));  
                System.out.println(Bytes.toString(kv.getFamily()));  
                System.out.println(Bytes.toString(kv.getQualifier()));  
                System.out.println(Bytes.toString(kv.getValue()));  
                System.out.println(Bytes.toString(kv.getKey()));  
            }  
            table.close();  
        } catch (Exception e) {  
            // TODO: handle exception  
            e.printStackTrace();  
        }  
    }  

（3）Cell对象：Cell对象是KeyValue对象的父类，Cell对象中的所有方法在KeyValue对象中全部被实现。因此根据继承的特征，我们可以使用Cell对象中的API操作KeyValue对象。

[java]view plain copy
public void CellObject(String tableName,String row,String family,String qualifier)  
    {  
        Configuration conf=init();  
        try {  
            //查看表是否存在  
            HBaseAdmin admin=new HBaseAdmin(conf);  
            if(!admin.tableExists(tableName))  
            {  
                System.out.println("the table "+tableName+" is not exist");  
                admin.close();  
                System.exit(1);  
            }  
            admin.close();  
            //创建表连接  
            HTable table=new HTable(conf, tableName);  
            Get get=new Get(Bytes.toBytes(row));  
            get.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));  
            Result result=table.get(get);  
            Cell[] cells=result.rawCells();  
            for(Cell cell:cells)  
            {  
                System.out.println(Bytes.toString(cell.getRow()));  
                System.out.println(Bytes.toString(cell.getFamily()));  
                System.out.println(Bytes.toString(cell.getQualifier()));  
                System.out.println(Bytes.toString(cell.getValue()));  
            }  
            table.close();  
        } catch (Exception e) {  
            // TODO: handle exception  
        }  
    }  

查询列族下的列的数据：

1 public List<String> find(String tableName,String family,String cloumn){
2         List<String> rows = hbaseTemplate.find(tableName, family,cloumn, new RowMapper<String>() {
3             public String mapRow(Result result, int rowNum) throws Exception {
4                 return Bytes.toString(result.getRow());
5             }
6         });
7         return rows;
8     }

查询指定行健的一列数据：

1 public String get(String tableName,String family,String cloumn,String rowKey){
2         String context = hbaseTemplate.get(tableName, "NCEP_p_wa_2014032212_tp_006.nc", family, cloumn, new RowMapper<String>() {
3             public String mapRow(Result result, int rowNum) throws Exception {
4                 return Bytes.toString(result.value());
5               }
6             });
7         return context;
8     }

5. 基本命令

查看有哪些表list

查看所有数据 scan “表名”

查看数据 get “wcm“，”lrowkey“ 得到一条数据

删除一条数据delete “表名”,”主键”,”列族”,”列”

删除整条数据deleteAll “表名”,”主键”

Hbase的特点：

>>在命令窗口只能一次更新一个单元格；

>>在程序中通过调用HTable.setAutoFlush(false)方法可以将HTable写客户端的自动flush关闭，这样可以批量写入数据到 HBase，而不是有一条put就执行一次更新，只有当put填满客户端写缓存时，才实际向HBase服务端发起写请求。默认情况下auto flush是开启的。