hbase使用Java或者Scala的一些基础操作

一、HBase连接的方式概况

主要分为:

  • 纯Java API读写HBase的方式;
  • Spark读写HBase的方式;
  • Flink读写HBase的方式;
  • HBase通过Phoenix读写的方式;

  第一种方式是HBase自身提供的比较原始的高效操作方式,而第二、第三则分别是Spark、Flink集成HBase的方式,最后一种是第三方插件Phoenix集成的JDBC方式,Phoenix集成的JDBC操作方式也能在Spark、Flink中调用。

二、Java

1.HBase老版本:
(1)建表:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HBaseAdmin;

public class CreateTableTest {
    public static void main(String[] args) throws IOException  {
        //设置HBase数据库的连接配置参数
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum",  "192.168.8.71");  //  Zookeeper的地址
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        String tableName = "emp";
        String[] family = { "basicinfo","deptinfo"};
        HBaseAdmin hbaseAdmin = new HBaseAdmin(conf);
        //创建表对象
        HTableDescriptor hbaseTableDesc = new HTableDescriptor(TableName.valueOf(tableName));
        for(int i = 0; i < family.length; i++) {
            //设置表字段
            hbaseTableDesc.addFamily(new HColumnDescriptor(family[i]));
        }
        //判断表是否存在,不存在则创建,存在则打印提示信息
        if(hbaseAdmin.tableExists(TableName.valueOf(tableName))) {
            System.out.println("TableExists!");
            /**
             这个方法是用来结束当前正在运行中的java虚拟机。如何status是非零参数,那么表示是非正常退出。
             System.exit(0)是将你的整个虚拟机里的内容都停掉了 ,而dispose()只是关闭这个窗口,但是并没有停止整个application exit() 。无论如何,内存都释放了!也就是说连JVM都关闭了,内存里根本不可能还有什么东西
             System.exit(0)是正常退出程序,而System.exit(1)或者说非0表示非正常退出程序
             System.exit(status)不管status为何值都会退出程序。和return 相比有以下不同点:   return是回到上一层,而System.exit(status)是回到最上层
             */
            System.exit(0);
        } else{
            hbaseAdmin.createTable(hbaseTableDesc);
            System.out.println("Create table Success!");
        }
    }
}
(2)删除表:
import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class DeleteMyTable {
 
	public static void main(String[] args) throws IOException {
		String tableName = "mytb";
		delete(tableName);
	}
	
	public static Configuration getConfiguration() {
		Configuration conf = HBaseConfiguration.create();
		conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
		conf.set("hbase.zookeeper.quorum", "192.168.8.71");
		return conf;
	}
	
	public static void delete(String tableName) throws IOException {
		HBaseAdmin hAdmin = new HBaseAdmin(getConfiguration());
		if(hAdmin.tableExists(tableName)){
			try {
				hAdmin.disableTable(tableName);
				hAdmin.deleteTable(tableName);
				System.err.println("Delete table Success");
			} catch (IOException e) {
			System.err.println("Delete table Failed ");
		}
		}else{
		System.err.println("table not exists");
		}
	}
}
(3)写入数据:

  某电商网站,后台有买家信息表buyer,每注册一名新用户网站后台会产生一条日志,并写入hbase中。

  数据格式为:用户ID(buyer_id),注册日期(reg_date),注册IP(reg_ip),卖家状态(buyer_status,0表示冻结 ,1表示正常),以“\t”分割,数据内容如下:

用户ID   注册日期  注册IP   卖家状态
20385,2010-05-04,124.64.242.30,1
20386,2010-05-05,117.136.0.172,1
20387,2010-05-06 ,114.94.44.230,1
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class PutData {
    public static void main(String[] args) throws MasterNotRunningException,
            ZooKeeperConnectionException, IOException {
        String tableName = "mytb";
        String columnFamily = "mycf";

        put(tableName, "20385", columnFamily, "2010-05-04:reg_ip", "124.64.242.30");
        put(tableName, "20385", columnFamily, "2010-05-04:buyer_status", "1");

        put(tableName, "20386", columnFamily, "2010-05-05:reg_ip", "117.136.0.172");
        put(tableName, "20386", columnFamily, "2010-05-05:buyer_status", "1");

        put(tableName, "20387", columnFamily, "2010-05-06:reg_ip", "114.94.44.230");
        put(tableName, "20387", columnFamily, "2010-05-06:buyer_status", "1");

    }

    public static Configuration getConfiguration() {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
        conf.set("hbase.zookeeper.quorum", "192.168.8.71");
        return conf;
    }

    public static void put(String tableName, String row, String columnFamily,
                           String column, String data) throws IOException {
        HTable table = new HTable(getConfiguration(), tableName);
        Put put = new Put(Bytes.toBytes(row));
        put.add(Bytes.toBytes(columnFamily),
                Bytes.toBytes(column),
                Bytes.toBytes(data));
        table.put(put);
        System.err.println("SUCCESS");
    }
}

注意:手动构建 HTable 已被弃用。请使用 连接 来实例化表 。通过连接,可以使用 Connection.getTable(TableName)

(4)查询:
import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
 
public class GetData {
	public static void main(String[] args) throws IOException {
		String tableName = "mytb";
		get(tableName, "20386");
 
	}
 
	public static Configuration getConfiguration() {
		Configuration conf = HBaseConfiguration.create();
		conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
		conf.set("hbase.zookeeper.quorum", "192.168.8.71");
		return conf;
	}
 
	public static void get(String tableName, String rowkey) throws IOException {
		HTable table = new HTable(getConfiguration(), tableName);
		Get get = new Get(Bytes.toBytes(rowkey));
		Result result = table.get(get);
		byte[] value1 = result.getValue("mycf".getBytes(), "2010-05-05:reg_ip".getBytes());
		byte[] value2 = result.getValue("mycf".getBytes(), "2010-05-05:buyer_status".getBytes());
		System.err.println("line1:SUCCESS");
		System.err.println("line2:" 
		        + new String(value1) + "\t"
				+ new String(value2));
	}
}

  前面的这些代码都这样执行:

[hadoop@h71 q1]$ /usr/jdk1.7.0_25/bin/javac GetData.java
[hadoop@h71 q1]$ /usr/jdk1.7.0_25/bin/java GetData 
(5)通过Java Api与HBase交互的一些常用的操作整合:
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
 
public class HBaseTest2 {
 
    // 声明静态配置
    static Configuration conf = null;
    static {
        conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.205.153");
    }
 
    /*
     * 创建表
     * @tableName 表名
     * @family 列族列表
     */
    public static void creatTable(String tableName, String[] family) throws Exception {
        HBaseAdmin admin = new HBaseAdmin(conf);
        HTableDescriptor desc = new HTableDescriptor(tableName);
        for (int i = 0; i < family.length; i++) {
            desc.addFamily(new HColumnDescriptor(family[i]));
        }
        if (admin.tableExists(tableName)) {
            System.out.println("table Exists!");
            System.exit(0);
        } else {
            admin.createTable(desc);
            System.out.println("create table Success!");
        }
    }
 
    /*
     * 为表添加数据(适合知道有多少列族的固定表)
     * @rowKey rowKey
     * @tableName 表名
     * @column1 第一个列族列表
     * @value1 第一个列的值的列表
     * @column2 第二个列族列表
     * @value2 第二个列的值的列表
     */
    public static void addData(String rowKey, String tableName,
            String[] column1, String[] value1, String[] column2, String[] value2)
            throws IOException {
        Put put = new Put(Bytes.toBytes(rowKey));// 设置rowkey
        HTable table = new HTable(conf, tableName);// 获取表
        HColumnDescriptor[] columnFamilies = table.getTableDescriptor() // 获取所有的列族
                .getColumnFamilies();
 
        for (int i = 0; i < columnFamilies.length; i++) {
            String familyName = columnFamilies[i].getNameAsString(); // 获取列族名
            if (familyName.equals("article")) { // article列族put数据
                for (int j = 0; j < column1.length; j++) {
                    put.add(Bytes.toBytes(familyName),
                            Bytes.toBytes(column1[j]), Bytes.toBytes(value1[j]));
                }
            }
            if (familyName.equals("author")) { // author列族put数据
                for (int j = 0; j < column2.length; j++) {
                    put.add(Bytes.toBytes(familyName),
                            Bytes.toBytes(column2[j]), Bytes.toBytes(value2[j]));
                }
            }
        }
        table.put(put);
        System.out.println("add data Success!");
    }
 
    /*
     * 根据rwokey查询
     * @rowKey rowKey
     * @tableName 表名
     */
    public static Result getResult(String tableName, String rowKey) throws IOException {
        Get get = new Get(Bytes.toBytes(rowKey));
        HTable table = new HTable(conf, tableName);// 获取表
        Result result = table.get(get);
        for (KeyValue kv : result.list()) {
            System.out.println("family:" + Bytes.toString(kv.getFamily()));
            System.out
                    .println("qualifier:" + Bytes.toString(kv.getQualifier()));
            System.out.println("value:" + Bytes.toString(kv.getValue()));
            System.out.println("Timestamp:" + kv.getTimestamp());
            System.out.println("-------------------------------------------");
        }
        return result;
    }
 
    /*
     * 遍历查询hbase表
     * @tableName 表名
     */
    public static void getResultScann(String tableName) throws IOException {
        Scan scan = new Scan();
        ResultScanner rs = null;
        HTable table = new HTable(conf, tableName);
        try {
            rs = table.getScanner(scan);
            for (Result r : rs) {
                for (KeyValue kv : r.list()) {
                    System.out.println("family:"
                            + Bytes.toString(kv.getFamily()));
                    System.out.println("qualifier:"
                            + Bytes.toString(kv.getQualifier()));
                    System.out
                            .println("value:" + Bytes.toString(kv.getValue()));
                    System.out.println("timestamp:" + kv.getTimestamp());
                    System.out
                            .println("-------------------------------------------");
                }
            }
        } finally {
            rs.close();
        }
    }
 
    /*
     * 查询表中的某一列
     * @tableName 表名
     * @rowKey rowKey
     */
    public static void getResultByColumn(String tableName, String rowKey,
            String familyName, String columnName) throws IOException {
    	HTable table = new HTable(conf, tableName);
        Get get = new Get(Bytes.toBytes(rowKey));
        get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName)); // 获取指定列族和列修饰符对应的列
        Result result = table.get(get);
        for (KeyValue kv : result.list()) {
            System.out.println("family:" + Bytes.toString(kv.getFamily()));
            System.out
                    .println("qualifier:" + Bytes.toString(kv.getQualifier()));
            System.out.println("value:" + Bytes.toString(kv.getValue()));
            System.out.println("Timestamp:" + kv.getTimestamp());
            System.out.println("-------------------------------------------");
        }
    }
 
    /*
     * 更新表中的某一列
     * @tableName 表名
     * @rowKey rowKey
     * @familyName 列族名
     * @columnName 列名
     * @value 更新后的值
     */
    public static void updateTable(String tableName, String rowKey,
            String familyName, String columnName, String value)
            throws IOException {
    	HTable table = new HTable(conf, tableName);
        Put put = new Put(Bytes.toBytes(rowKey));
        put.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName),
                Bytes.toBytes(value));
        table.put(put);
        System.out.println("update table Success!");
    }
 
    /*
     * 查询某列数据的多个版本
     * @tableName 表名
     * @rowKey rowKey
     * @familyName 列族名
     * @columnName 列名
     */
    public static void getResultByVersion(String tableName, String rowKey,
            String familyName, String columnName) throws IOException {
    	HTable table = new HTable(conf, tableName);
        Get get = new Get(Bytes.toBytes(rowKey));
        get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName));
        get.setMaxVersions(5);
        Result result = table.get(get);
        for (KeyValue kv : result.list()) {
            System.out.println("family:" + Bytes.toString(kv.getFamily()));
            System.out
                    .println("qualifier:" + Bytes.toString(kv.getQualifier()));
            System.out.println("value:" + Bytes.toString(kv.getValue()));
            System.out.println("Timestamp:" + kv.getTimestamp());
            System.out.println("-------------------------------------------");
        }
        List<?> results = table.get(get).list(); Iterator<?> it =
        results.iterator(); while (it.hasNext()) {
        System.out.println(it.next().toString()); }
    }
 
    /*
     * 删除指定的列
     * @tableName 表名
     * @rowKey rowKey
     * @familyName 列族名
     * @columnName 列名
     */
    public static void deleteColumn(String tableName, String rowKey,
            String falilyName, String columnName) throws IOException {
    	HTable table = new HTable(conf, tableName);
        Delete deleteColumn = new Delete(Bytes.toBytes(rowKey));
        deleteColumn.deleteColumns(Bytes.toBytes(falilyName),
                Bytes.toBytes(columnName));
        table.delete(deleteColumn);
        System.out.println(falilyName + ":" + columnName + "is deleted!");
    }
 
    /*
     * 删除指定的列
     * @tableName 表名
     * @rowKey rowKey
     */
    public static void deleteAllColumn(String tableName, String rowKey)
            throws IOException {
    	HTable table = new HTable(conf, tableName);
        Delete deleteAll = new Delete(Bytes.toBytes(rowKey));
        table.delete(deleteAll);
        System.out.println("all columns are deleted!");
    }
 
    /*
     * 删除表
     * @tableName 表名
     */
    public static void deleteTable(String tableName) throws IOException {
        HBaseAdmin admin = new HBaseAdmin(conf);
        admin.disableTable(tableName);
        admin.deleteTable(tableName);
        System.out.println(tableName + "is deleted!");
    }
 
    public static void main(String[] args) throws Exception {
 
        // 创建表
//        String tableName = "blog2"; String[] family = { "article","author" };
//        creatTable(tableName,family);
 
        // 为表添加数据
//        String[] column1 = { "title", "content", "tag" }; String[] value1 = {"Head First HBase",
//        "HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data."
//        , "Hadoop,HBase,NoSQL" }; String[] column2 = { "name", "nickname" };
//        String[] value2 = { "nicholas", "lee" }; addData("rowkey1", "blog2",
//        column1, value1, column2, value2);
 
        // 删除一列
//         deleteColumn("blog2", "rowkey1", "author", "nickname");
 
        // 删除所有列
//        deleteAllColumn("blog2", "rowkey1");
        
        //删除表
//        deleteTable("blog2");
 
        // 查询
//         getResult("blog2", "rowkey1");
 
        // 查询某一列的值
//         getResultByColumn("blog2", "rowkey1", "author", "name");
//         updateTable("blog2", "rowkey1", "author", "name","bin");
//         getResultByColumn("blog2", "rowkey1", "author", "name");
 
        // 遍历查询
//         getResultScann("blog2");
 
        // 查询某列的多版本
         getResultByVersion("blog2", "rowkey1", "author", "name");
    }
}

注意:手动构建 HTable 已被弃用。请使用 连接 来实例化表 。通过连接,可以使用 Connection.getTable(TableName)
 

2.HBase 2版本:

参考:HBase读写的几种方式(一)java篇

(1)连接HBase:

  我这里使用的是HBase 2.1.2版本。这里我们采用静态方式连接HBase,不同于2.1.2之前的版本,无需创建HBase线程池,HBase2.1.2提供的代码已经封装好,只需创建调用即可:

/**
  * 声明静态配置
  */
static Configuration conf = null;
static Connection conn = null;
static {
       conf = HBaseConfiguration.create();
       conf.set("hbase.zookeeper.quorum", "hadoop01,hadoop02,hadoop03");
       conf.set("hbase.zookeeper.property.client", "2181");
       try{
           conn = ConnectionFactory.createConnection(conf);
       }catch (Exception e){
           e.printStackTrace();
       }
}
(2)创建HBase的表:

  创建HBase表,是通过Admin来执行的,表和列簇则是分别通过TableDescriptorBuilder和ColumnFamilyDescriptorBuilder来构建:

/**
 * 创建只有一个列簇的表
 * @throws Exception
 */
public static void createTable() throws Exception{
    Admin admin = conn.getAdmin();
    if (!admin.tableExists(TableName.valueOf("test"))){
        TableName tableName = TableName.valueOf("test");
        //表描述器构造器
        TableDescriptorBuilder tdb = TableDescriptorBuilder.newBuilder(tableName);
        //列族描述器构造器
        ColumnFamilyDescriptorBuilder cdb = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("user"));
        //获得列描述器
        ColumnFamilyDescriptor cfd = cdb.build();
        //添加列族
        tdb.setColumnFamily(cfd);
        //获得表描述器
        TableDescriptor td = tdb.build();
        //创建表
        admin.createTable(td);
    }else {
        System.out.println("表已存在");
    }
    //关闭连接
    conn.close();
}
(3)HBase表添加数据:

  通过put api来添加数据:

/**
 * 添加数据(多个rowKey,多个列族)
 * @throws Exception
 */
public static void insertMany() throws Exception{
    Table table = conn.getTable(TableName.valueOf("test"));
    List<Put> puts = new ArrayList<Put>();
    Put put1 = new Put(Bytes.toBytes("rowKey1"));
    put1.addColumn(Bytes.toBytes("user"), Bytes.toBytes("name"), Bytes.toBytes("wd"));

    Put put2 = new Put(Bytes.toBytes("rowKey2"));
    put2.addColumn(Bytes.toBytes("user"), Bytes.toBytes("age"), Bytes.toBytes("25"));

    Put put3 = new Put(Bytes.toBytes("rowKey3"));
    put3.addColumn(Bytes.toBytes("user"), Bytes.toBytes("weight"), Bytes.toBytes("60kg"));

    Put put4 = new Put(Bytes.toBytes("rowKey4"));
    put4.addColumn(Bytes.toBytes("user"), Bytes.toBytes("sex"), Bytes.toBytes("男"));

    puts.add(put1);
    puts.add(put2);
    puts.add(put3);
    puts.add(put4);
    table.put(puts);
    table.close();
}
(4)删除HBase的列簇或列:
/**
 * 根据rowKey删除一行数据、或者删除某一行的某个列簇,或者某一行某个列簇某列
 * @param tableName
 * @param rowKey
 * @throws Exception
 */
public static void deleteData(TableName tableName, String rowKey, String rowKey, String columnFamily, String columnName) throws Exception{
    Table table = conn.getTable(tableName);
    Delete delete = new Delete(Bytes.toBytes(rowKey));
    //①根据rowKey删除一行数据
    table.delete(delete);
    
    //②删除某一行的某一个列簇内容
    delete.addFamily(Bytes.toBytes(columnFamily));
    
    //③删除某一行某个列簇某列的值
    delete.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName));
    table.close();
}
(5)更新HBase表的列:

  使用Put api直接替换掉即可:

/**
 * 根据RowKey , 列簇, 列名修改值
 * @param tableName
 * @param rowKey
 * @param columnFamily
 * @param columnName
 * @param columnValue
 * @throws Exception
 */
public static void updateData(TableName tableName, String rowKey, String columnFamily, String columnName, String columnValue) throws Exception{
    Table table = conn.getTable(tableName);
    Put put1 = new Put(Bytes.toBytes(rowKey));
    put1.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName), Bytes.toBytes(columnValue));
    table.put(put1);
    table.close();
}
(6)HBase查询:

  HBase查询分为get、scan、scan和filter结合。filter过滤器又分为RowFilter(rowKey过滤器)、SingleColumnValueFilter(列值过滤器)、ColumnPrefixFilter(列名前缀过滤器)。

/**
 * 根据rowKey查询数据
 * @param tableName
 * @param rowKey
 * @throws Exception
 */
public static void getResult(TableName tableName, String rowKey) throws Exception{
    Table table = conn.getTable(tableName);
    //获得一行
    Get get = new Get(Bytes.toBytes(rowKey));
    Result set = table.get(get);
    Cell[] cells = set.rawCells();
    for (Cell cell: cells){
        System.out.println(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::" +
        Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
    }
    table.close();
}

//过滤器 LESS <  LESS_OR_EQUAL <=   EQUAL =   NOT_EQUAL <>   GREATER_OR_EQUAL >=   GREATER >   NO_OP 排除所有

/**
 * @param tableName
 * @throws Exception
 */
public static void scanTable(TableName tableName) throws Exception{
    Table table = conn.getTable(tableName);
    
    //①全表扫描
    Scan scan1 = new Scan();
    ResultScanner rscan1 = table.getScanner(scan1);
    
    //②rowKey过滤器
    Scan scan2 = new Scan();
    //str$ 末尾匹配,相当于sql中的 %str  ^str开头匹配,相当于sql中的str%
    RowFilter filter = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator("Key1$"));
    scan2.setFilter(filter);
    ResultScanner rscan2 = table.getScanner(scan2);
    
    //③列值过滤器
    Scan scan3 = new Scan();
    //下列参数分别为列族,列名,比较符号,值
    SingleColumnValueFilter filter3 = new SingleColumnValueFilter(Bytes.toBytes("author"), Bytes.toBytes("name"),
               CompareOperator.EQUAL, Bytes.toBytes("spark"));
    scan3.setFilter(filter3);
    ResultScanner rscan3 = table.getScanner(scan3);
    
    //列名前缀过滤器
    Scan scan4 = new Scan();
    ColumnPrefixFilter filter4 = new ColumnPrefixFilter(Bytes.toBytes("name"));
    scan4.setFilter(filter4);
    ResultScanner rscan4 = table.getScanner(scan4);
    
    //过滤器集合
    Scan scan5 = new Scan();
    FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
    SingleColumnValueFilter filter51 = new SingleColumnValueFilter(Bytes.toBytes("author"), Bytes.toBytes("name"),
              CompareOperator.EQUAL, Bytes.toBytes("spark"));
    ColumnPrefixFilter filter52 = new ColumnPrefixFilter(Bytes.toBytes("name"));
    list.addFilter(filter51);
    list.addFilter(filter52);
    scan5.setFilter(list);
    ResultScanner rscan5 = table.getScanner(scan5);
    
    for (Result rs : rscan){
        String rowKey = Bytes.toString(rs.getRow());
        System.out.println("row key :" + rowKey);
        Cell[] cells = rs.rawCells();
        for (Cell cell: cells){
            System.out.println(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength()) + "::"
                    + Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::"
                    + Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        System.out.println("-------------------------------------------");
    }
}
(7)快速测试hbase连通性Demo(查询所有表名):
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseTableLister {

    public static void main(String[] args) throws IOException {
        // 创建配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置HBase集群的连接信息
        config.set("hbase.zookeeper.quorum", "zk1.example.com,zk2.example.com,zk3.example.com");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        // 创建HBase连接
        Connection connection = ConnectionFactory.createConnection(config);

        // 获取HBase管理员对象
        Admin admin = connection.getAdmin();

        // 查询HBase中的所有表
        TableName[] tableNames = admin.listTableNames();

        // 输出表名
        for (TableName tableName : tableNames) {
            System.out.println(Bytes.toString(tableName.getName()));
        }

        // 关闭连接
        admin.close();
        connection.close();
    }
}
(8)工具类实践:
import com.alibaba.fastjson.JSONObject;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.tool.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.util.StopWatch;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import scala.Tuple2;
import scala.Tuple3;

import java.io.IOException;
import java.util.*;
import java.util.concurrent.TimeUnit;

import static com.xiaoqiang.utils.DateUtil.dateToTimestamp;

/**
 * @author: xiaoqiang
 * @version: 1.0
 * @description: com.xiaoqiang.utils
 * @date:2023/8/30
 */
public class HBaseUtil {

    public static List<Put> puts = new ArrayList<Put>();
    public static List<Delete> deletes = new ArrayList<>();

    /**
     * @Description: 获取 hbase 客户端连接
     * @param properties
     */
    public static Connection getHBaseConnect(Properties properties) {
        try {
            // 连接 Hbase
            Configuration hbConf = HBaseConfiguration.create();
            hbConf.set("fs.defaultFS", properties.getProperty("fs.defaultFS"));
            hbConf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
            hbConf.set("hbase.zookeeper.quorum", properties.getProperty("hbase.zookeeper.quorum"));
            hbConf.set("hbase.zookeeper.property.clientPort", properties.getProperty("hbase.zookeeper.property.clientPort"));

            return ConnectionFactory.createConnection(hbConf);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    /**
     * @Description: 获取 hbase 的所有表名
     * @param properties
     */
    public static void getAllTablesName(Properties properties) throws IOException {
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        // 获取HBase管理员对象
        Admin admin = connection.getAdmin();
        // 查询HBase中的所有表
        TableName[] tableNames = admin.listTableNames();

        // 输出表名
        for (TableName tableName : tableNames) {
            System.out.println(Bytes.toString(tableName.getName()));
        }

        // 关闭连接
        admin.close();
        connection.close();
    }

    /**
     * @Description: 获取指定表的 count 数
     * @param properties
     * @param tablename
     */
    public static void rowCountByScanFilter(Properties properties, String tablename) throws IOException {
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        // 获取查询全部列
        long rowCount = 0;
        try {
            // 计时
            StopWatch stopWatch = new StopWatch();
            stopWatch.start();

            TableName name=TableName.valueOf(tablename);
            // connection为类静态变量
            Table table = connection.getTable(name);
            Scan scan = new Scan();

            // startRowKey 和 endRowKey
            scan.withStartRow("student-1_1694307600".getBytes());
            scan.withStopRow("student-1_1694311200".getBytes());

            FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
            Filter filter1 = new FirstKeyOnlyFilter();
            Filter filter2 = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator("^tudent-12"));
            list.addFilter(filter1);
//            list.addFilter(filter2);

            // FirstKeyOnlyFilter只会取得每行数据的第一个kv,提高count速度
            scan.setFilter(list);

            ResultScanner rs = table.getScanner(scan);
            for (Result result : rs) {
                rowCount += result.size();
//                break;
                if (rowCount == 1000) {
                    System.out.println("rowCount-->"+rowCount);
                } else if (rowCount == 10000) {
                    System.out.println("rowCount-->"+rowCount);
                } else if (rowCount == 50000) {
                    System.out.println("rowCount-->"+rowCount);
                } else if (rowCount == 100000) {
                    System.out.println("rowCount-->"+rowCount);
                } else if (rowCount == 1000000) {
                    System.out.println("rowCount-->"+rowCount);
                }
            }

            stopWatch.stop();
            System.out.println("RowCount: " + rowCount);
            System.out.println("统计耗时:" + stopWatch.now(TimeUnit.SECONDS));
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }

    /**
     * @Description: 根据 rowKey 查询数据
     * @param properties
     * @param tableName
     * @param rowKey
     */
    public static void getResultByRowKey(Properties properties, String tableName, String rowKey) throws Exception{
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name=TableName.valueOf(tableName);
        Table table = connection.getTable(name);

        // 获得一行
        Get get = new Get(Bytes.toBytes(rowKey));
        Result set = table.get(get);
        Cell[] cells = set.rawCells();
        for (Cell cell: cells){
            System.out.println(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::" +
                    Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        table.close();
    }

    /**
     * @Description: 根据列值查询数据(列值过滤器)或者 rowKey 前后缀查询数据
     * @param properties
     * @param tableName
     */
    public static void scanTableByFilter(Properties properties, String tableName) throws Exception{
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name=TableName.valueOf(tableName);
        Table table = connection.getTable(name);

        // rowKey过滤器
        Scan scan1 = new Scan();
        // str$ 末尾匹配,相当于sql中的 %str  ^str开头匹配,相当于sql中的str%
        RowFilter filter1 = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator("^tudent-58"));
        scan1.setFilter(filter1);
        ResultScanner rscan1 = table.getScanner(scan1);

        // 列值过滤器
        Scan scan2 = new Scan();
        // 下列参数分别为列族,列名,比较符号,值
        SingleColumnValueFilter filter2 = new SingleColumnValueFilter(Bytes.toBytes("DATA"), Bytes.toBytes("work_status"),
                CompareOperator.EQUAL, Bytes.toBytes("1"));
        scan2.setFilter(filter2);
        ResultScanner rscan2 = table.getScanner(scan2);

        FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
        list.addFilter(filter1);
        list.addFilter(filter2);
        Scan scan3 = new Scan();
        scan3.setFilter(list);
        ResultScanner rscan3 = table.getScanner(scan3);

        int i=0;
        for (Result rs : rscan3){
            String rowKey = Bytes.toString(rs.getRow());
            System.out.println("row key :" + rowKey);
            Cell[] cells = rs.rawCells();

            // 控制取回的条数
            i++;
            if (i == 100) {
                break;
            }
            for (Cell cell: cells){
                System.out.println(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength()) + "::"
                        + Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::"
                        + Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
            }
            System.out.println("-------------------------------------------");
        }
    }

    /**
     * @Description: 根据表名查询数据(只限查数据量不大的数据,数据量大的表慎用!!!!!!!!!)
     * @param properties
     * @param tableName
     */
    public static void getResultByTableName(Properties properties, String tableName) throws Exception{
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name=TableName.valueOf(tableName);
        Table table = connection.getTable(name);
        // 全表扫描
        Scan scan = new Scan();

        // str$ 末尾匹配,相当于sql中的 %str  ^str开头匹配,相当于sql中的str%
        RowFilter filter1 = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator("^tudent-1"));
//        scan.setFilter(filter1);

        // startRowKey 和 endRowKey
        scan.withStartRow("student-46_1694016000".getBytes());
        scan.withStopRow("student-46_1694102400".getBytes());

        ResultScanner rscan = table.getScanner(scan);
        int num = 0;
        for (Result rs : rscan){
            String rowKey = Bytes.toString(rs.getRow());
            System.out.println("row key :" + rowKey);
            Cell[] cells = rs.rawCells();
            num++;
            for (Cell cell: cells){
//                if (fieldName.equals("Date")) {
                    System.out.println(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength()) + "::"
                            + Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::"
                            + Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
//                }
            }
            System.out.println("-------------------------------------------");
            if (num == 10) {
                break;
            }
        }
        table.close();
    }

    /**
     * @Description: 添加数据(多个rowKey,多个列族)
     * @param properties
     * @param tableName
     */
    public static void insertMany(Properties properties, String tableName) {
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name=TableName.valueOf(tableName);
        Table table = null;
        try {
            table = connection.getTable(name);
        } catch (IOException e) {
            e.printStackTrace();
        }

        List<Put> puts = new ArrayList<Put>();
//        Put put1 = new Put(Bytes.toBytes("student-1_1694072961_10"));
//        put1.addColumn(Bytes.toBytes("DATA"), Bytes.toBytes("work_status"), Bytes.toBytes("0"));

        Put put2 = new Put(Bytes.toBytes("student-58_1694074491"));
        put2.addColumn(Bytes.toBytes("DATA"), Bytes.toBytes("work_status"), Bytes.toBytes("0"));

//        puts.add(put1);
        puts.add(put2);
        try {
            table.put(puts);
            System.out.println("添加数据成功-->");
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * @Description: 添加数据(多个rowKey,多个列族)
     * @param dataDataset
     * @param properties
     * @param tableName
     */
    public static void insertDataset(String workType, Dataset<Row> dataDataset, Properties properties, String tableName) {
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name = TableName.valueOf(tableName);
        Table table = null;
        try {
            table = connection.getTable(name);
        } catch (IOException e) {
            e.printStackTrace();
        }

        JavaRDD<Row> dataRDD = dataDataset.toJavaRDD();
        dataRDD.foreachPartition((VoidFunction<Iterator<Row>>) rowIterator -> {
            while (rowIterator.hasNext()) {
                Row next = rowIterator.next();
                String rowKey = next.getAs("work_id");
                String workTime = next.getAs("work_start_time");
                System.out.println("rowKey: " + rowKey);
                if (rowKey == null || rowKey == "") {
//                    System.out.println("rowKey为空");
                } else {
                    Put put = new Put(Bytes.toBytes(rowKey));
                    put.addColumn(Bytes.toBytes("DATA"), Bytes.toBytes("WORK_TIME"), Bytes.toBytes(workTime));
                    put.addColumn(Bytes.toBytes("DATA"), Bytes.toBytes("DATA_SOURCE"), Bytes.toBytes("offline"));
                    puts.add(put);
                }
            }
        });

        try {
            table.put(puts);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * @Description: 根据 rowKey 删除一行数据、或者删除某一行的某个列簇,或者某一行某个列簇某列
     * @param properties
     * @param tableName
     * @param rowKey
     * @param columnFamily
     * @param columnName
     */
    public static void deleteDataOne(Properties properties, String tableName, String rowKey, String columnFamily, String columnName) throws Exception{
        // 创建HBase连接
        Connection connection = getHBaseConnect(properties);
        TableName name=TableName.valueOf(tableName);
        Table table = connection.getTable(name);

        Delete delete = new Delete(Bytes.toBytes(rowKey));

        // 删除某一行的某一个列簇内容
//        delete.addFamily(Bytes.toBytes(columnFamily));

        // 删除某一行某个列簇某列的值
//        delete.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName));

        // 根据rowKey删除一行数据
        table.delete(delete);
        table.close();
    }

    /**
     * @Description: 获取 HBASE Dataset数据
     * @param properties
     * @param spark
     * @param tableName
     * @param colsName
     * @return Dataset<Row>
     * @throws IOException
     */
    public static Dataset<Row> getHbaseDatasetData(Properties properties, SparkSession spark
            , String tableName, List<String> colsName, String RegexString, String startRowKey, String endRowKey) {

        Scan scan = new Scan();

        // 限制取回的条数
        Filter filter = new PageFilter(100000);
        // rowKey 过滤器(str$ 末尾匹配,相当于sql中的 %str  ^str开头匹配,相当于sql中的str%)
        RowFilter rowFilter = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator(RegexString));

        FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
//        list.addFilter(filter);
        list.addFilter(rowFilter);
        scan.setFilter(list);

        // startRowKey 和 endRowKey
        scan.withStartRow(startRowKey.getBytes());
        scan.withStopRow(endRowKey.getBytes());

        Base64.Encoder base64Encoder = Base64.getEncoder();
        String scanToString = null;
        try {
            scanToString = base64Encoder.encodeToString(ProtobufUtil.toScan(scan).toByteArray());
        } catch (IOException e) {
            e.printStackTrace();
        }

        Configuration hbConf = HBaseConfiguration.create();
        hbConf.set("fs.defaultFS", properties.getProperty("fs.defaultFS"));
        hbConf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
        hbConf.set("hbase.zookeeper.quorum", properties.getProperty("hbase.zookeeper.quorum"));
        hbConf.set("hbase.zookeeper.property.clientPort", properties.getProperty("hbase.zookeeper.property.clientPort"));
        //指定输入表名
        hbConf.set(TableInputFormat.INPUT_TABLE, tableName);
        //Base-64编码的Scanner
        hbConf.set(TableInputFormat.SCAN, scanToString);

        //获取JavaSparkContext
        JavaSparkContext jsc = JavaSparkContext.fromSparkContext(spark.sparkContext());
        //newAPIHadoopRDD这个RDD用于读取存储在Hadoop中的数据,文件有新 API 输入格式和额外的配置选项,传递到输入格式
        //参数conf会被广播,不可以修改,所以最好一个RDD一个conf
        JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD = jsc.newAPIHadoopRDD(hbConf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
        //使用map函数将JavaPairRDD=>JavaRDD,反之使用mapToPair函数
        JavaRDD<Row> dataRDD = hbaseRDD.map(new Function<Tuple2<ImmutableBytesWritable, Result>, Row>() {
            //序列化表示UID,用于类的版本控制
            private static final long serialVersionUID = 1L;

            //重写call()函数,返回Row类型,这部分需要根据自己需求将result中的数据封装为Object[]
            @Override
            public Row call(Tuple2<ImmutableBytesWritable, Result> tuple2) {
                Result result = tuple2._2;
                String[] values = new String[colsName.size() + 1];
                values[0] = Bytes.toString(result.getRow());
                for (int i = 0; i < colsName.size(); i++) {
                    values[i + 1] = Bytes.toString(result.getValue("DATA".getBytes(), colsName.get(i).getBytes()));
                }
                //creat()方法参数(object... value)
                return RowFactory.create((Object[]) values);
            }
        });
        List<StructField> structFields = new ArrayList<>();
        //创建StructField,基本等同于list内有几个StructField就是几列,需要和上面的Row的Object[]对应
        structFields.add(DataTypes.createStructField("rowKey", DataTypes.StringType, true));
        for (String col : colsName) {
            structFields.add(DataTypes.createStructField(col, DataTypes.StringType, true));
        }
        //构建schema,可以把它理解为架子,结构
        StructType schema = DataTypes.createStructType(structFields);
        //生成DataFrame,把书放入架子,把数据放入结构里就变成了dataframe
        return spark.createDataFrame(dataRDD, schema);
    }

    /**
     * @Description: 使用Spark时经常需要把数据落入HBase中,如果使用普通的Java API,写入会速度很慢
     * Bulk写入优势:
     * BulkLoad不会写WAL,也不会产生flush以及split。
     * 如果我们大量调用PUT接口插入数据,可能会导致大量的GC操作。除了影响性能之外,严重时甚至可能会对HBase节点的稳定性造成影响。但是采用Bulk就不会有这个顾虑。
     * 过程中没有大量的接口调用消耗性能
     * 优点:1)速度快,2)几乎不会失败  3)对hbase服务几乎无影响
     * <p>
     *
     * @param joinAllDs
     * @param properties
     * @param hbaseTableName
     * @param path
     */
    public static void saveToHbaseByBulk(Dataset<Row> joinAllDs, Properties properties, String hbaseTableName, String path) {
        //Hbase集群配置信息
        Configuration hconf = HBaseConfiguration.create();
        String fsAddress = properties.getProperty("fs.defaultFS");
        hconf.set("fs.defaultFS", fsAddress);
        hconf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
        hconf.set("hbase.zookeeper.quorum", properties.getProperty("hbase.zookeeper.quorum"));
        hconf.set("hbase.zookeeper.property.clientPort", properties.getProperty("hbase.zookeeper.property.clientPort"));
        hconf.set("hbase.mapreduce.hfileoutputformat.table.name", hbaseTableName);
        //设置hfile最大个数
        hconf.set("hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily", "3200");
        //设置hfile的大小
        hconf.set("hbase.hregion.max.filesize", "10737418240");
        Connection connection = null;
        Table table = null;
        Admin admin = null;
        try {
            //创建Hbase连接 获取要操作的Hbase表
            connection = ConnectionFactory.createConnection(hconf);
            TableName tableName = TableName.valueOf(hbaseTableName);
            table = connection.getTable(tableName);
            admin = connection.getAdmin();
            //创建表的Region分布
            RegionLocator regionLocator = connection.getRegionLocator(tableName);
            //将HFile写入Hbase对象
            LoadIncrementalHFiles bulkLoader = new LoadIncrementalHFiles(hconf);
            String[] columns = joinAllDs.columns();
            JavaPairRDD<ImmutableBytesWritable, KeyValue> hfileSortRowKeyAndCmRDD = joinAllDs
//                    .dropDuplicates("Time")
                    .javaRDD()
                    .mapToPair(row -> {
                        //组合rowKey 后续通过sortByKey根据rowKey整体排序
                        String studentId = row.getAs("student_id");
                        String workId = row.getAs("work_id");
                        String rowKey = studentId + workId;
                        //对Hbase列名按照字典排序
                        TreeMap<String, String> treeMap = new TreeMap<>();
                        //lambda表达式实现List接口sort方法排序
                        for (String cm : columns) {
                            Object getVal = row.getAs(cm);
                            if (getVal != null) {
                                String replace = getVal.toString().replace("WrappedArray(", "[").replace(")", "]");
                                treeMap.put(cm, replace);
                            }
                        }
                        return new Tuple2<String, TreeMap>(rowKey, treeMap);
                    }).sortByKey(true)
                    .coalesce(6)
                    .flatMapToPair(tuple2 -> {
                        List<Tuple2<ImmutableBytesWritable, KeyValue>> list = new ArrayList<>();
                        byte[] rowKey = tuple2._1.getBytes();  // 排序后的rowKey
                        byte[] cmf = "DATA".getBytes(); //列族
                        TreeMap<String, String> data = tuple2._2; //列名,对应的value
                        for (Map.Entry<String, String> map : data.entrySet()) {  //从排序过后的有序map集合里取列名 -> 对应的value
                            // Caused by: java.io.IOException: Added a key not lexically larger than previous.
                            KeyValue keyValue = new KeyValue(rowKey, cmf, map.getKey().getBytes(), System.currentTimeMillis(), map.getValue().getBytes());
                            list.add(new Tuple2<>(new ImmutableBytesWritable(rowKey), keyValue));
                        }
                        return list.iterator();
                    });

            //HFile在HDFS文件夹地址,该地址不能存在如果存在则会报错
            FileSystem fs = FileSystem.get(hconf);
            if (fs.exists(new Path(path))) {
                fs.delete(new Path(path), true);
                //将RDD以HFile文件格式保存HDFS
                hfileSortRowKeyAndCmRDD.coalesce(6).saveAsNewAPIHadoopFile(path,
                        ImmutableBytesWritable.class,
                        KeyValue.class,
                        HFileOutputFormat2.class, hconf);
                //将HFile 移动到 Hbase表的Region
                bulkLoader.doBulkLoad(new Path(path), admin, table, regionLocator);
            } else {
                hfileSortRowKeyAndCmRDD.coalesce(6).saveAsNewAPIHadoopFile(path,
                        ImmutableBytesWritable.class,
                        KeyValue.class,
                        HFileOutputFormat2.class, hconf);
                bulkLoader.doBulkLoad(new Path(path), admin, table, regionLocator);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            closeResource(connection, table, admin);
        }
    }

    /**
     * @Description: 关闭释放资源
     * @param connection
     * @param table
     * @param admin
     */
    private static void closeResource(Connection connection, Table table, Admin admin) {
        if (admin != null) {
            try {
                admin.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (table != null) {
            try {
                table.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (connection != null) {
            try {
                connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Logger.getLogger("org.apache.spark").setLevel(Level.WARN);

        // pro或dev
        String runMode = "test";

        // 获取配置文件
        String configFile = String.format("config-%s.properties", runMode);
        Properties properties = PropertiesUtil.getProperties(configFile);

        // 获取所有表名
//        getAllTablesName(properties);
        // 根据 rowKey 查询数据(返回一条数据)
//        getResultByRowKey(properties, "heheda", "heheawe_1694073276_10");
        // 获取指定表的 count 数
        rowCountByScanFilter(properties, "heheda");
        // 根据表名查询数据
//        getResultByTableName(properties, "heheda");
//        System.out.println(getResultByFmeterId(properties, "student-1"));
        // 添加数据
//        insertMany(properties, "heheda");
        // 根据 rowKey 删除一行数据、或者删除某一行的某个列簇,或者某一行某个列簇某列
//        deleteDataOne(properties, "heheda", "fawef_1694066860", "DATA", "student_name");
        // 根据 rowkey 或者列值查询数据
//        scanTableByFilter(properties, "heheda");
    }
}

三、Scala

1.读写hbase+sparksql查询hbase的表数据:
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{Put, Result}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.{TableInputFormat, TableOutputFormat}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapreduce.Job
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}

/**
 * @Auther: huiq
 * @Date: 2021/7/23
 * @Description: 连接hbase测试
 */
object OperateHbaseTest {

  def main(args: Array[String]): Unit = {

    //初始化spark
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName(this.getClass.getSimpleName)
    val spark: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()

    //初始化hbase,指定zookeeper的参数
    val config: Configuration = HBaseConfiguration.create()
    config.set("hbase.zookeeper.quorum", "node01,node02,node03") // HBase集群服务器地址(任一台)
    config.set("hbase.zookeeper.property.clientPort", "2181") // zookeeper客户端访问端口
    config.set("zookeeper.znode.parent", "/hbase-unsecure")

    val sc: SparkContext = spark.sparkContext


    // 设定读取的表名
    config.set(TableInputFormat.INPUT_TABLE,"test_schema1:t2")

    // 从hbase获取一张表的所有数据,得到一个RDD
    val hbaseRDD: RDD[(ImmutableBytesWritable, Result)] = sc.newAPIHadoopRDD(config,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])

    val count = hbaseRDD.count()
    println("Students RDD Count--->" + count)

    // 遍历输出
    hbaseRDD.foreach({ case (_,result) =>
      val key = Bytes.toString(result.getRow)
      val a = Bytes.toString(result.getValue("F".getBytes,"a".getBytes))
      val b = Bytes.toString(result.getValue("F".getBytes,"b".getBytes))
      println("Row key:"+key+" a:"+oldData+" b:"+newData)
    })


    // 写hbase
    val tablename = "test_schema1:t2"
    config.set(TableOutputFormat.OUTPUT_TABLE, "test_schema1:t2")

    val job = new Job(config)
    job.setOutputKeyClass(classOf[ImmutableBytesWritable])
    job.setOutputValueClass(classOf[Result])
    job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])

    val indataRDD = sc.makeRDD(Array("3,26,M","4,27,M")) //构建两行记录
    val rdd = indataRDD.map(_.split(',')).map{arr=>{
      val put = new Put(Bytes.toBytes(arr(0))) //行健的值
      put.addColumn(Bytes.toBytes("F"),Bytes.toBytes("a"),Bytes.toBytes(arr(1)))
      put.addColumn(Bytes.toBytes("F"),Bytes.toBytes("b"),Bytes.toBytes(arr(2)))
//      put.add(Bytes.toBytes("F"),Bytes.toBytes("a"),Bytes.toBytes(arr(1)) // 网上有这么写的,但是我这里报错,没有深入研究或许是版本的问题吧
      (new ImmutableBytesWritable, put)
    }}
    rdd.saveAsNewAPIHadoopDataset(job.getConfiguration())


    // 构建Row类型的RDD
    val rowRDD = hbaseRDD.map(p => {
      val name = Bytes.toString(p._2.getValue(Bytes.toBytes("F"),Bytes.toBytes("a")))
      val age = Bytes.toString(p._2.getValue(Bytes.toBytes("F"),Bytes.toBytes("b")))
      Row(name,age)
    })
    // 构造DataFrame的元数据
    val schema = StructType(List(
      StructField("a",StringType,true),
      StructField("b",StringType,true)
    ))

    // 构造DataFrame
    val dataFrame = spark.createDataFrame(rowRDD,schema)

    // 注册成为临时表供SQL查询操作
    dataFrame.createTempView("t2")
    val result: DataFrame = spark.sql("select * from t2")
    result.show()
  }
}

注意:我这里使用的是Ambari2.7.4+HDP3.1.4版本,正常整合之后这三行代码都不用写也可以连接成功hbase),但是一开始我启动程序报错:java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid
在这里插入图片描述
原因:hbase-site.xml文件中的配置为:

    <property>
      <name>zookeeper.znode.parent</name>
      <value>/hbase-unsecure</value>
    </property>

方法一:改为<value>/hbase</value>,重启hbase。
注意:zookeeper.znode.parent的值为在zookeeper中创建的目录。
在这里插入图片描述
方法二:在代码中添加config.set("zookeeper.znode.parent", "/hbase-unsecure")

参考:
Spark入门:读写HBase数据
[HBase 基础]-- HBaseConfiguration类,参数说明

补充在hive的ods层中创建完hbase的映射表后,想通过create table as sleect ...语句在dwd层生成相应的表,但是却报错:
在这里插入图片描述
解决方法1:启动的时候添加添加相应的参数:beeline -hiveconf zookeeper.znode.parent=/hbase-unsecure或者hive -hiveconf zookeeper.znode.parent=/hbase-unsecure

解决方法2:我使用的是ambari的hdp 3.1.4版本,添加如下配置后执行beeline命令即可
在这里插入图片描述

2.sparkstreaming读取kafka的数据再写入hbase:
import java.util

import com.rongrong.bigdata.utils.{KafkaZkUtils, UMSUtils}
import kafka.utils.ZkUtils
import org.apache.hadoop.hbase.client.{ConnectionFactory, Put}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.log4j.Logger
import org.apache.spark.streaming.{Durations, StreamingContext}

import scala.util.Try

object StandardOnlie {

  private val logger: Logger = Logger.getLogger(this.getClass)

  def main(args: Array[String]): Unit = {
    val spark = InitializeSpark.createSparkSession("StandardOnlie", "local")
    val streamingContext = new StreamingContext(spark.sparkContext, Durations.seconds(30))
    val kafkaParams = Map[String, Object](
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "node01:6667,node02:6667,node03:6667",
      ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.GROUP_ID_CONFIG -> "group-02",
      ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "latest",
      ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (false: java.lang.Boolean)
    )

    val topic: String = "djt_db.test_schema1.result"
    val zkUrl = "node01:2181,node02:2181,node03:2181"
    val sessionTimeout = 1000
    val connectionTimeout = 1000

    val zkClient = ZkUtils.createZkClient(zkUrl, sessionTimeout, connectionTimeout)

    val kafkaStream = KafkaZkUtils.createDirectStream(zkClient, streamingContext, kafkaParams, topic)

    // 开始处理批次消息
    kafkaStream.foreachRDD(rdd => {
      // 处理从获取 kafka 中的数据
      logger.info("=============== Total " + rdd.count() + " events in this    batch ..")

      rdd.foreach(x => {
        val configuration = HBaseConfiguration.create()
        configuration.set("zookeeper.znode.parent", "/hbase-unsecure")
        val connection = ConnectionFactory.createConnection(configuration)

        // 获取kafka中真正的数据
          var usmString = x.value()
          val flag: Boolean = UMSUtils.isHeartbeatUms(usmString)
          if (!flag) { // 过滤掉心跳数据
            val usmActiontype = UMSUtils.getActionType(usmString)
            println(s"该条数据的类型为--->${usmActiontype}")
            println("读取kafka数据,解析正文数据:" + x.value())
            val data: util.Map[String, String] = UMSUtils.getDataFromUms(usmString)
            //获取表连接
            val table = connection.getTable(TableName.valueOf("test_schema1:t2"))
            val rowkey: String = 123456 + "_" + data.get("a")
            val put = new Put(Bytes.toBytes(rowkey))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_active_"), Bytes.toBytes(data.get("ums_active_")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_id_"), Bytes.toBytes(data.get("ums_id_")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_ts_"), Bytes.toBytes(data.get("ums_ts_")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("a"), Bytes.toBytes(data.get("a")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("b"), Bytes.toBytes(data.get("b")))
            table.put(put)
            table.close()
            //将数据写入HBase,若出错关闭table
            Try(table.put(put)).getOrElse(table.close())
            //分区数据写入HBase后关闭连接
            table.close()
            println(s"解析到的数据为--->${data}")
          }
        })

        // 更新offset到zookeeper中
        KafkaZkUtils.saveOffsets(zkClient, topic, KafkaZkUtils.getZkPath(kafkaParams, topic), rdd)
      })
    })
    streamingContext.start()
    streamingContext.awaitTermination()
    streamingContext.stop()
  }
}
import kafka.utils.{ZKGroupTopicDirs, ZkUtils}
import org.I0Itec.zkclient.ZkClient
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.TopicPartition
import org.apache.log4j.Logger
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, HasOffsetRanges, KafkaUtils}

object KafkaZkUtils {
  private val logger: Logger = Logger.getLogger(this.getClass)

  /**
   * 获取 consumer 在zk上的路径
   * @param kafkaParams
   * @param topic
   * @return
   */
  def getZkPath(kafkaParams: Map[String, Object], topic: String): String ={
    val topicDirs = new ZKGroupTopicDirs(kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).toString, topic)
    s"${topicDirs.consumerOffsetDir}"
  }

  /**
   * 创建 DirectStream
   * @param zkClient
   * @param streamingContext
   * @param kafkaParams
   * @param topic
   * @return
   */
  def createDirectStream(zkClient: ZkClient,streamingContext: StreamingContext, kafkaParams: Map[String, Object], topic: String): InputDStream[ConsumerRecord[String, String]] = {


    val zkPath = getZkPath(kafkaParams,topic)

    //读取 topic 的 offset
    val storedOffsets = readOffsets(zkClient, topic, zkPath)

    val kafkaStream: InputDStream[ConsumerRecord[String, String]] = storedOffsets match {
      //上次未保存offsets
      case None =>
        KafkaUtils.createDirectStream[String, String](
          streamingContext,
          PreferConsistent,
          ConsumerStrategies.Subscribe[String, String](Array(topic), kafkaParams)
        )
      case Some(fromOffsets) => {
        KafkaUtils.createDirectStream[String, String](
          streamingContext,
          PreferConsistent,
          // 指定分区消费,无法动态感知分区变化
          //          ConsumerStrategies.Assign[String, String](fromOffsets.keys.toList, kafkaParams, fromOffsets)
          ConsumerStrategies.Subscribe[String, String](List(topic), kafkaParams, fromOffsets)
        )
      }
    }
    kafkaStream
  }

  /**
   * 保存 offset
   * @param zkClient
   * @param topic
   * @param zkPath
   * @param rdd
   */
  def saveOffsets(zkClient: ZkClient,topic: String, zkPath: String, rdd: RDD[_]): Unit = {

    logger.info("Saving offsets to zookeeper")

    val offsetsRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges

    offsetsRanges.foreach(offsetRange => logger.debug(s"Using ${offsetRange}"))

    val offsetsRangesStr = offsetsRanges.map(offsetRange => s"${offsetRange.partition}:${offsetRange.untilOffset}").mkString(",")

    logger.info(s"Writing offsets to Zookeeper: ${offsetsRangesStr}")

    ZkUtils(zkClient, false).updatePersistentPath(zkPath, offsetsRangesStr)
  }

  /**
   * 读取 offset
   * @param zkClient
   * @param topic
   * @param zkPath
   * @return
   */
  def readOffsets(zkClient: ZkClient, topic: String, zkPath: String): Option[Map[TopicPartition, Long]] = {
    logger.info("Reading offsets from zookeeper")

    val (offsetsRangesStrOpt, _) = ZkUtils(zkClient, false).readDataMaybeNull(zkPath)
    offsetsRangesStrOpt match {
      case Some(offsetsRangesStr) => {
        logger.debug(s"Read offset ranges: ${
          offsetsRangesStr
        }")
        val offsets: Map[TopicPartition, Long] = offsetsRangesStr.split(",").map(s => s.split(":"))
          .map({
            case Array(partitionStr, offsetStr) =>
              (new TopicPartition(topic, partitionStr.toInt) -> offsetStr.toLong)
                // 这里可以指定offset的位置读取,注意:还需要把上面createDirectStream方法的ConsumerStrategies.Assign代码打开
//              (new TopicPartition(topic, partitionStr.toInt) -> "20229".toLong)
          }).toMap
        Some(offsets)
      }
      case None =>
        logger.info("No offsets found in Zookeeper")
        None
    }
  }
}

  本来想用foreachPartition的,但是我没成功。可参考sparkstreaming写入hbase,他这个里面在foreachPartition里面创建连接可以,并且他提了一句“获取HBase连接,分区创建一个连接,分区不跨节点,不需要序列化”,但是在我这里这样写就报错了:
在这里插入图片描述
报错代码:

      rdd.foreachPartition(partitionRecords => {
        val configuration = HBaseConfiguration.create()
        configuration.set("zookeeper.znode.parent", "/hbase-unsecure")
        val connection = ConnectionFactory.createConnection(configuration) //获取HBase连接,分区创建一个连接,分区不跨节点,不需要序列化

        partitionRecords.foreach(x => {

        // 获取kafka中真正的数据
          var usmString = x.value()

查了一些资料:关于scala:通过Spark写入HBase:任务不可序列化zookeeper报错: org.I0Itec.zkclient.exception.ZkMarshallingError: java.io.EOFExceptionSpark 序列化问题全解HBase连接池,目前也还没有找到解决方法,有知道的人可以探讨一下。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
### 回答1: 批量操作是指一次性对多个数据进行操作,可以提高操作效率。在使用 Spark 读写 HBase 时,也可以使用批量操作来提高效率。具体实现方式如下: 1. 批量写入数据 使用 HBase 的 Put 类来创建要写入的数据,然后将 Put 对象添加到一个 List 中,最后使用 HBase 的 Table 类的 put 方法来批量写入数据。示例代码如下: ```scala val conf = HBaseConfiguration.create() val connection = ConnectionFactory.createConnection(conf) val table = connection.getTable(TableName.valueOf("table_name")) val puts = new ListBuffer[Put]() for (i <- 1 to 100) { val put = new Put(Bytes.toBytes(s"row_$i")) put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("col"), Bytes.toBytes(s"value_$i")) puts += put } table.put(puts.toList.asJava) ``` 2. 批量读取数据 使用 HBase 的 Get 类来创建要读取的数据,然后将 Get 对象添加到一个 List 中,最后使用 HBase 的 Table 类的 get 方法来批量读取数据。示例代码如下: ```scala val conf = HBaseConfiguration.create() val connection = ConnectionFactory.createConnection(conf) val table = connection.getTable(TableName.valueOf("table_name")) val gets = new ListBuffer[Get]() for (i <- 1 to 100) { val get = new Get(Bytes.toBytes(s"row_$i")) get.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("col")) gets += get } val results = table.get(gets.toList.asJava) for (result <- results) { val row = Bytes.toString(result.getRow) val value = Bytes.toString(result.getValue(Bytes.toBytes("cf"), Bytes.toBytes("col"))) println(s"$row: $value") } ``` 以上就是使用 Scala 实现 Spark 读写 HBase 的批量操作的方法。 ### 回答2: 在实际的数据处理中,一次需要对多条数据进行读写操作,如果每次都进行单条的读写逐条操作会使程序效率非常低下。所以spark提供了批量操作API,可以对多条数据进行一次性的读写操作,极大地提高了程序的效率。 批量读操作: 批量读取数据的方式有两种:Get和Scan。 使用Get方式读取多条数据,需要将每条数据对应的Get对象添加到List集合当中,再将List集合转换为RDD对象进行操作。示例代码如下: ```scala val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE, tableName) val gets = new util.ArrayList[Get]() gets.add(new Get(Bytes.toBytes("rowkey1"))) gets.add(new Get(Bytes.toBytes("rowkey2"))) gets.add(new Get(Bytes.toBytes("rowkey3"))) conf.set(TableInputFormat.SCAN, convertScanToString(new Scan())) val getRdd = sc.parallelize(gets) val hbaseRdd = getRdd.map((_, null)).hbaseBulkGet(conf, tableName, (result: Result) => { val kv: Array[Byte] = result.getValue(Bytes.toBytes(family), Bytes.toBytes(column)) Bytes.toString(kv) }) println(hbaseRdd.collect.toBuffer) ``` 使用Scan方式读取多条数据,需要将Scan对象作为参数传入,再将RDD对象转换为PairRDD并使用hbaseScan方法进行操作。示例代码如下: ```scala val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE, tableName) val scan = new Scan(Bytes.toBytes("rowkey1"), Bytes.toBytes("rowkey3")) conf.set(TableInputFormat.SCAN, convertScanToString(scan)) val hbaseRdd = sc.hbaseScanRDD(conf).map((result: Result) => { val kv: Array[Byte] = result.getValue(Bytes.toBytes(family), Bytes.toBytes(column)) Bytes.toString(kv) }) println(hbaseRdd.collect.toBuffer) ``` 批量写操作: 批量写操作可以使用Put对象集合,将多条数据对应的Put对象添加到集合中,并将集合转换成RDD进行操作即可。示例代码如下: ```scala val conf = HBaseConfiguration.create() conf.set(TableOutputFormat.OUTPUT_TABLE, tableName) val puts = new util.ArrayList[Put]() puts.add(new Put(Bytes.toBytes("rowkey1")).addColumn(Bytes.toBytes(family), Bytes.toBytes(column), Bytes.toBytes("value1"))) puts.add(new Put(Bytes.toBytes("rowkey2")).addColumn(Bytes.toBytes(family), Bytes.toBytes(column), Bytes.toBytes("value2"))) puts.add(new Put(Bytes.toBytes("rowkey3")).addColumn(Bytes.toBytes(family), Bytes.toBytes(column), Bytes.toBytes("value3"))) val putRdd = sc.parallelize(puts) putRdd.hbaseBulkPut(conf, tableName) ``` 总结: 批量操作是Spark访问HBase的常见操作方式,在实际的实现过程中需要注意以下几点: 1、Get和Scan对象在HBase中读取数据的方式不一样,需要注意区分; 2、使用批量读操作可以大大提高程序效率,减少读写操作的时间消耗; 3、使用批量写操作需要合理规划写入的数据,避免出现数据冲突问题,影响程序的运行。 ### 回答3: 本篇文章将继续深入介绍如何使用Scala编码实现Spark读写操作HBase,具体涉及到HBase的批量操作。 一、Batch操作概述 在使用HBase进行数据处理的时候,我们常常需要对一个或多个表进行批量操作,批量操作即是针对 HBase的多行进行插入、删除等操作,以此来实现在HBase操作上的高效处理。HBase提供了很多批量操作API,比如 Put、Get、Delete、Scan,这些API都是可以批量操作的。 在Spark中,我们同样可以使用类似的API对HBase进行批量操作。本文将根据具体需求使用Spark实现HBase的批量操作。 二、批量操作的实现 Spark读写HBase时,使用RDD中的foreachPartition来对每个分区进行处理,在该函数内使用HBase API进行操作。关于批量操作,我们可以在每个分区中开启一个batch操作,将每个操作加入batch后,再提交即可。 例如,我们可以考虑实现一个批量put的功能,将RDD中的数据一批一批写入表中: ``` def insert(tableName: String, rdd: RDD[(String, String)]): Unit = { try{ rdd.foreachPartition({ iter => val conf = HBaseUtils.getHBaseConfiguration() conf.set(TableOutputFormat.OUTPUT_TABLE, tableName) val conn = ConnectionFactory.createConnection(conf) val table = conn.getTable(TableName.valueOf(tableName)) val puts = new java.util.ArrayList[Put]() iter.foreach { case (rowKey:String, value: String) => { // 构造put对象并append val put = new Put(Bytes.toBytes(rowKey)) put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier), Bytes.toBytes(value)) puts.add(put) if (puts.size() >= batchSize) { // 多条记录组成的put对象,使用put(List<Put>)一次性写入 table.put(puts) puts.clear() } } } // 如果puts还有内容,再写一次 if (puts.size() > 0) { table.put(puts) puts.clear() } table.close() conn.close() }) } catch { case e: Exception => e.printStackTrace() } } ``` 在该方法中,我们使用foreachPartition遍历RDD中的每个分区,然后通过Connection来获取HBase表实例。 之后定义了一个用于存放Put的List,当List的大小大于等于batchSize时,就将这个List中的所有put操作提交给HBase执行。 最后,释放资源,并为大家展示如何调用这个方法: ``` val rdd: RDD[(String, String)] = ... val tableName: String = ... insert(tableName, rdd) ``` 使用这种方式实现批量put,我们可以将一批数据提交到HBase执行,从而提升写入效率。当然,对于其他批量操作也可以应用类似的方式。 三、总结 本文根据实际需求,结合Spark和HBase的特点,实现了一些常用的批量操作,为大家提供了一个快速、高效的HBase操作方案。批量操作的好处是,可以将多条记录一次性操作,请求与写入等待时间都会得到缩短,获得更高的效率。感兴趣的同学可以试试,在实际开发中应该会受益匪浅!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小强签名设计

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值