Hbase API操作

最新推荐文章于 2022-10-19 21:04:32 发布

Michael-DM

最新推荐文章于 2022-10-19 21:04:32 发布

阅读量233

点赞数

分类专栏：学习

本文链接：https://blog.csdn.net/weixin_46235157/article/details/105172868

版权

学习专栏收录该内容

32 篇文章 0 订阅

订阅专栏

首先我们要获得连接

		//新建一个Configuration用于放置连接参数
		Configuration conf = new Configuration();
		//往Configuration里设置参数，需要设置存储位置和zoopeeper接口，这个可以在hbase-site.xml中查看
        conf.set("hbase.rootdir","hdfs://192.168.0.133:8020/hbase");
        conf.set("hbase.zookeeper.quorum","192.168.0.133:2181");
        //使用连接工厂来创建连接，参数需要将上面定义的Configuration放进去
        Connection connection = ConnectionFactory.createConnection(conf);
        //通过connection创建一个admin
		Admin admin = connection.getAdmin();

Connection :用于连接hbase
Admin :用于进行一些创建删除等操作

这样我们的初始化工作就完成了

1.创建表操作

public void createTable() throws  Exception{
		//定义表名
        String tableName = "test";
        //创建TableName，参数就是要创建的表名
        TableName table = TableName.valueOf(tableName);
        //判断表是否已存在
        if (admin.tableExists(table)){
        	//已存在提示
            System.err.println("table already exists");
        }else {
        	//这边需要创建一个表描述
            HTableDescriptor hTableDescriptor = new HTableDescriptor(table);
            //往表描述里添加一个列族
            hTableDescriptor.addFamily(new HColumnDescriptor("cf"));
            //使用 admin.createTable创建表
            admin.createTable(hTableDescriptor);
        }
    }

运行成功，查看hbase
在这里插入图片描述
需求完成

2.删除表操作

public void createTable() throws  Exception{
		//定义表名
        String tableName = "test";
        //创建TableName，参数就是要创建的表名
        TableName table = TableName.valueOf(tableName);
        //判断表是否已存在
        if (admin.tableExists(table)){
        	//先失效
        	admin.disableTable(table)
        	//在删除
     		admin.deleteTable(table)
        }
    }

2.查询所有表及其列族

public void queryInfos() throws  Exception{
		//使用admin.listTables()可以获得所有表信息的数组
        HTableDescriptor[] tables = admin.listTables();
        //遍历数组
        for (HTableDescriptor table : tables) {
        	//这边下打印出表名
            System.out.println(table.getNameAsString());
            //使用table.getColumnFamilies()可以获得一个表的表描述数组
            HColumnDescriptor[] cfs = table.getColumnFamilies();
            //表里表描述
            for (HColumnDescriptor cf : cfs) {
            	//打印列族，这边出了列族还有其他很多信息大家可以自己测试
                System.out.println("\t"+cf.getNameAsString());
            }
        }
    }

运行成功查看控制台
在这里插入图片描述
需求完成

3.添加和修改记录(put)

在hbase中添加和修改记录是一样的,会判断记录是否已存在，没存在就添加，已存在就修改

		TableName tableName = TableName.valueOf("test");
		//通过connection.getTable获得一个表连接,参数string类型的表名
        Table table = connection.getTable(tableName);
        //新建一个put操作，参数是字节类型的rowKey
        Put put = new Put(Bytes.toBytes("dom"));
        //通过put.addColumn进行添加操作参数有3个
        //1.字节类型的列族
        //2.字节类型的列
        //3.字节类型的值
        put.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("name"),Bytes.toBytes("dom"));
        put.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("age"),Bytes.toBytes("28"));
        put.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("add"),Bytes.toBytes("bj"));
        //直接使用上面创建的表连接.put将put操作放进去执行
        table.put(put);

查看hbase:
在这里插入图片描述
这样我们需求就完成了
当然如果要一次添加多个rowKey的话，可以在最后table.put是参数放一个put类型的list集合

4.通过RowKey获取值（Get）

public void get01() throws Exception {
        TableName tableName = TableName.valueOf("test");
		//通过connection.getTable获得一个表连接,参数string类型的表名
        Table table = connection.getTable(tableName);
        //新建一个get操作，参数是字节类型的rowKey
        Get get = new Get("dom".getBytes());
        //直接table.get获得一个返回值
        Result result = table.get(get);
        //遍历result.rawCells
        for (Cell cell : result.rawCells()) {
        	//通过cell.getRow()可以获得一个字节数组类型的RowKey
            System.out.println(Bytes.toString(cell.getRow())
            		//通过CellUtil.cloneFamil获得列族名，参数是cell
                    + "\t" + Bytes.toString(CellUtil.cloneFamily(cell))
                    + //通过CellUtil.cloneQualifier获得列名，参数是cell
                    + "\t" + Bytes.toString(CellUtil.cloneQualifier(cell))
                    + //通过CellUtil.cloneValue获得值，参数是cell
                    + "\t" + Bytes.toString(CellUtil.cloneValue(cell))
            );
        }
    }

查看控制台
在这里插入图片描述

如果想查询rowkey下指定内容只要往get里添加即可
参数1：字节数组类型的列族名
参数2：字节数组类型的列名

get.addColumn(Bytes.toBytes(“cf”),Bytes.toBytes(“age”));

如果不添加 get.addColumn那么会返回所有列族下的内容

4.通过RowKey获取值（Scan）

scan全局：

public void scan() throws Exception {
        TableName tableName = TableName.valueOf("test");
		//通过connection.getTable获得一个表连接,参数string类型的表名
        Table table = connection.getTable(tableName);
        //新建一个scan操作，参数是字节类型的rowKey
        Scan scan = new Scan();
        //直接table.getScanner获得一个返回值
        ResultScanner scanner = table.getScanner(scan);
        //接下来的遍历和上面get方法一样
        for (Result result : scanner) {
            for (Cell cell : result.rawCells()) {
                System.out.println(Bytes.toString(cell.getRow())
                        + "\t" + Bytes.toString(CellUtil.cloneFamily(cell))
                        + "\t" + Bytes.toString(CellUtil.cloneQualifier(cell))
                        + "\t" + Bytes.toString(CellUtil.cloneValue(cell))
                );
            }
        }
    }

控制台输出结果（全部列族）
在这里插入图片描述
如果想查询指定rowKey只要新建scan时添加参数一个get即可
get里面参数一个字节数组类型的rowKey

Scan scan = new Scan(new Get(Bytes.toBytes("dom")));

当然也可以通过过滤的方式，新建scan时添加两个字节数组类型的rowKey，表示起始rowKey和结束rowKey，输出内容不包括结束rowKey（左闭右开）

Scan scan = new Scan(Bytes.toBytes("air"),Bytes.toBytes("ssc"));

在这里插入图片描述
当然，这个起始和结束顺序是不能调换的，如果只添加一个字节数组类型的rowKey，那么表示的是起始rowKey而没有结束rowKey
如果想查询指定列名下内容要在scan下添加列名就好了；有两个参数分别是字节数据类型的列族和列名

scan.addColumn(Bytes.toBytes(“cf”),Bytes.toBytes(“name”));

控制台

在这里插入图片描述

5.过滤查询

我们引用上面scan的代码进行filter
1.RowFilter主键匹配

public void scan() throws Exception {
        TableName tableName = TableName.valueOf("test");
		//通过connection.getTable获得一个表连接,参数string类型的表名
        Table table = connection.getTable(tableName);
        //创建一个正则字符串表示以m为结尾
        String reg = "^*m";
        //新建一个Filter ，有两个参数，第一个表示等于，第二个新建一个RegexStringComparator将上面哪个正则字符串放进去
        Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator(reg));
        //新建一个scan操作，参数是字节类型的rowKey
        Scan scan = new Scan();
        //将Filter设置到scan中去
        //直接table.getScanner获得一个返回值
        ResultScanner scanner = table.getScanner(scan);
        //接下来的遍历和上面get方法一样
        for (Result result : scanner) {
            for (Cell cell : result.rawCells()) {
                System.out.println(Bytes.toString(cell.getRow())
                        + "\t" + Bytes.toString(CellUtil.cloneFamily(cell))
                        + "\t" + Bytes.toString(CellUtil.cloneQualifier(cell))
                        + "\t" + Bytes.toString(CellUtil.cloneValue(cell))
                );
            }
        }
    }

查看控制台
在这里插入图片描述
2.PrefixFilter前缀匹配

//新建Filter时使用PrefixFilter，参数一个字节数组类型的字符串表示以上面开头
Filter filter = new PrefixFilter(Bytes.toBytes("s"));

控制台
在这里插入图片描述
3.多过滤
我们就可以创建一个FilterList

FilterList filterList = new FilterList();

往里面添加任意数量的filter就行了

6.将rdd保存到hdfs

首先rdd必须是RDD[(ImmutableBytesWritable, Put)]类型

//设置保存到指定表名，两个参数，第二个参数string类型表名
conf.set(TableOutputFormat.OUTPUT_TABLE, tableName)
df.saveAsNewAPIHadoopFile(
		//第一个参数指定保存的路径
      s"hdfs://hadoop000:8020/acc/out/$day",
      //第二个参数指定rdd第一个类型
      classOf[ImmutableBytesWritable],
      //第三个参数指定rdd第二个类型
      classOf[Put],
      //第四个参数指定输出字节数组
      classOf[TableOutputFormat[ImmutableBytesWritable]]，
      //第五参数将conf丢进去
      conf
    )

6.读取hbase为rdd

首先rdd必须是RDD[(ImmutableBytesWritable, Put)]类型

//设置要读取的表名，两个参数，第二个参数string类型表名
conf.set(TableInputFormat.INPUT_TABLE,tableName)
//新建一个scan
val scan = new Scan
//设置读取指定列，不指定默认全部读取，两个参数分别是字节数组类型的列族和列名
scan.addColumn(Bytes.toBytes("o"),Bytes.toBytes("country"));
scan.addColumn(Bytes.toBytes("o"),Bytes.toBytes("province"));
//这里要将scan设置进去
conf.set(TableInputFormat.SCAN,Base64.encodeBytes(ProtobufUtil.toScan(scan).toByteArray))
//使用sparkContext.newAPIHadoopRDD获取，ImmutableBytesWritable为rowKey，有4个参数
val rdd: RDD[(ImmutableBytesWritable, Result)] = spark.sparkContext.newAPIHadoopRDD(
		//第一个参数将conf丢进去
      conf, 
      //第二个参数指定输入字节数组
      classOf[TableInputFormat],
      //第三个参数指定rdd第一个类型
      classOf[ImmutableBytesWritable], 
      //第四个参数指定rdd第二个类型
      classOf[Result])

Michael-DM

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hbase API操作

首先我们要获得连接 //新建一个Configuration用于放置连接参数 Configuration conf = new Configuration(); //往Configuration里设置参数，需要设置存储位置和zoopeeper接口，这个可以在hbase-site.xml中查看 conf.set("hbase.rootdir","hdfs://192.168...
复制链接

扫一扫

专栏目录