HBase学习笔记(四)-案例

1陌陌消息存储案例基本信息

1.1案例介绍

陌陌有非常多的用户消息数据,这些消息都需要大量保存,读取会少,写比较多,都比较少。要存储这样海量的数据,Hbase就非常合适,Hbase适合存储写多读少的数据。

通过本案例学习1、Hbase的表设计,Hbase预分区,Rowkey设计;2、Hbase调优;3、Apache Phoenix;4、分页查询;5数据接口开发

1.2、数据结构

字段名

说明

msg_time

消息时间

sender_nickyname

发件人昵称

sender_account

发件人账号

sender_sex

发件人性别

sender_ip

发件人IP

sender_os

发件人系统

sender_phone_type

发件人手机型号

sender_network

发件人网络制式

sender_gps

发件人GPS

receiver_nickyname

收件人昵称

receiver_ip

收件人IP

receiver_account

收件人账号

receiver_os

收件人系统

receiver_phone_type

收件人手机型号

receiver_network

收件人网络制式

receiver_gps

收件人GPS

receiver_sex

收件人性别

msg_type

消息类型

distance

双方距离

message

消息

2 准备工作

2.1创建IDEA Maven项目

项目目录结构

2.2创建脚本文件

在上面的hbase_shell文件夹下创建脚本文件,命名为create_ns_table.rb

3 表结构设计

3.1 命名空间(namespace)

说明:保存很多表,根据业务划分命名空间,便于管理。类似于Hive中的数据库,不同数据库下面可以放不同类型的表。Hbase默认名称空间是default。Hbase还有一个命名空间【hbase】用于存放系统内建表【namespace、meta】

创建、查看、删除命名空间

语法:

## 创建命名空间
create_namespace 'MOMO_CHAT'
#查看所有命名空间
list_namespace
##删除之前的命名空间
drop_namespace 'MOMO_CHAT'
## 查看指定命名空间
describe_namespace 'MOMO_CHAT'

命名空间创建表

create 'MOMO_CHAT:MSG','C1'
已经包含表的命名空间不能删除

3.2列簇设计

Hbase中列簇越少越好,两个及以上列簇Hbase性能不是很好,一个列簇对应一个store存储,也对应一个memstore,到达设置的阈值就会flush,多个列簇就会同时flush,带来不必要的IO开销,本案例中设计一个列簇:C1

3.3版本设计

需要保存的历史聊天记录是不会更新的,一旦数据保存到Hbase不会更新,无需考虑版本问题

本项目中保留一个版本即可,这样可以节省口空间

Hbase默认创建表的版本为1,保持默认

3.4查看表

hbase(main):013:0> describe 'MOMO_CHAT:MSG'

Table MOMO_CHAT:MSG is ENABLED

MOMO_CHAT:MSG

COLUMN FAMILIES DESCRIPTION

{NAME => 'C1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DEL

ETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN

_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMOR

Y => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',

BLOCKCACHE => 'true', BLOCKSIZE => '65536'}

1 row(s)

3.5数据压缩

Hbase可以多种压缩编码,包括如下

压缩算法

压缩后占比

压缩

解压缩

GZIP

13.4%

21 MB/s

118 MB/s

LZO

20.5%

135 MB/s

410 MB/s

Zippy/Snappy

22.2%

172 MB/s

409 MB/s

只在硬盘压缩,内存中或者网络传输中没有压缩。本案例采用GZ

上一小节中的COMPRESSION => 'NONE',表示没有使用压缩

设置数据压缩

# 创建新表的数据压缩
create  'MOMO_CHAT:MSG',{NAME=>'C1',COMPRESSION=>'GZ'}
# 修改表的压缩方式
alter 'MOMO_CHAT:MSG',{NAME=>'C1',COMPRESSION=>'GZ'}

如果表里有数据alter要谨慎执行

3.6ROWKEY设计原则

3.6.1HBase官方的设计原则

避免使用递增行键和时许数据:如果是递增的,会有很多数据写入时,写入都在一台机器上,尽量应当写入压力均衡到各个RegionServer。

避免ROWKEY和列的长度过大:一个rowkey会存储多遍,如下所示,rowkey太长,导致存储冗余过多

列簇和列名也是存储的,所以列的长度不要太大。

使用long类型的比String类型更节省空间,long类型8个字节,字符串使用3倍的字节存储

ROWKEY的唯一性,设计ROWKEY时,必须保证唯一性,重复了会覆盖以前的数据

3.6.2 避免数据热点

大量的客户端访问集群的一个点或几个点(可能是读或者写)、大量的访问可能会使得某个服务器超出承受能力,导致性能下降。

预分区:默认一个Hbase表只有一个region。

每个region都有两个重要属性startKey、endKey,默认是空的,没有边界,都存储在一起。region中rowkey按照字典序存储。但是当数据越来越大时,region将会分裂,读取一个中间值,进行分裂。

预分区个数,等于节点倍数。默认region大小10G。

ROWKEY避免热点:避免数据写道同一个节点中去,按照rowkey顺序存储,提供以下策略

  1. 反转策略:例如时间戳或电话号码,前几位相同,后几位不同,可以考虑反转落到region,反转可以随机分布,但是牺牲了rowkey的有序性,不利于scan,scan是顺序扫描。

  1. 加盐策略:在rowkey前面加一些固定长度的随机数,随机数保障负载均衡。问题:查询的时候无法知道随机数是什么,只能scan。

  1. 哈希策略:基于rowkey取hash,将hash值作为前缀,同一个rowkey的hash值是一样的,hash的方法有MD5,sha1、sha256或sha512,缺点hashing也不利于scan

3.6.3 陌陌打招呼数据预分区

预分区实践案例

方法1:指定startKey、endKey,

create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'

方法2:指定分区数量和分区策略

#分区个数15个,
#rowkey是十六进制(HexStringSplit)字符串作为前缀,
#rowkey是(DecimalStringSplit)10进制数字字符串作为前缀的
#rowkey是(UniformSplit)是前缀完全随机
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

region数量根据数量预估,受限于三个虚拟机,设计6个分区

rokey设计

为保证数据均匀分布,使用MD5Hash作为前缀。

ROWKEY=MD5Hash_发件人账号_收件人账号_时间戳

业务分区脚本

##删除之前创建的表
disable 'MOMO_CHAT:MSG'
drop 'MOMO_CHAT:MSG'
##创建表(指定预分区)
create 'MOMO_CHAT:MSG', {NAME => "C1", COMPRESSION => "GZ"}, { NUMREGIONS => 6, SPLITALGO => 'HexStringSplit'}

HDFS中的体现

4项目

4.1 项目初始化

导入依赖

<repositories><!-- 代码库 -->
    <repository>
        <id>aliyun</id>
        <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        <releases>
            <enabled>true</enabled>
        </releases>
        <snapshots>
            <enabled>false</enabled>
            <updatePolicy>never</updatePolicy>
        </snapshots>
    </repository>
</repositories>

<dependencies>
    <!-- HBase客户端 -->
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>2.1.0</version>
    </dependency>
    <!-- Xml操作相关 -->
    <dependency>
        <groupId>com.github.cloudecho</groupId>
        <artifactId>xmlbean</artifactId>
        <version>1.5.5</version>
    </dependency>
    <!-- 操作Office库 -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>4.0.1</version>
    </dependency>
    <!-- 操作Office库 -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>4.0.1</version>
    </dependency>
    <!-- 操作Office库 -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml-schemas</artifactId>
        <version>4.0.1</version>
    </dependency>
    <!-- 操作JSON -->
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.62</version>
    </dependency>
    <!-- phoenix core -->
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-core</artifactId>
        <version>5.0.0-HBase-2.0</version>
    </dependency>
    <!-- phoenix 客户端 -->
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-queryserver-client</artifactId>
        <version>5.0.0-HBase-2.0</version>
    </dependency>
</dependencies>


<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <target>1.8</target>
                <source>1.8</source>
            </configuration>
        </plugin>
    </plugins>
</build>

4.2规划包

cn.btks.momo_chat.service

用于存放数据服务接口相关代码,例如:查询的API代码

cn.btks.momo_chat.service.impl

用于存放数据服务接口实现类相关代码,例如:查询的API代码

cn.btks.momo_chat.tool

工具类

cn.btks.momo_chat.entity

存放实体类

4.3导入excel工具类

package cn.btks.momo_chat.tool;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.util.*;
import java.util.logging.Logger;

public class ExcelReader {
    private static Logger log = Logger.getLogger("client");

    public static void main(String[] args) {
        String xlxsPath = "E:\\002学习\\大数据\\HBase\\4.资料\\陌陌海量消息存储案例\\测试数据集.xlsx";
        Map<String, List<String>> mapData = readXlsx(xlxsPath, "陌陌数据");

        for(int i = 0; i < 10000; ++i) {
            System.out.println(randomColumn(mapData, "sender_nickyname"));
        }
    }

    /**
     * 随机获取某一列的数据
     * @param columnName 列名
     * @return 随机数据
     */
    public static String randomColumn(Map<String, List<String>> resultMap, String columnName) {
        List<String> valList = resultMap.get(columnName);

        if(valList == null) throw new RuntimeException("未读取到列名为" + columnName + "的任何数据!");
        Random random = new Random();

        int randomIndex = random.nextInt(valList.size());

        return valList.get(randomIndex);
    }

    /**
     * 将Excel文件读取为Map结构: <column_name, list>
     * 其中column_name为第4行的名字
     * @param path Excel文件路径(要求Excel为2007)
     * @param sheetName 工作簿名称
     * @return Map结构
     */
    public static Map<String, List<String>> readXlsx(String path, String sheetName)
    {
        // 列的数量
        int columnNum = 0;
        HashMap<String, List<String>> resultMap = new HashMap<String, List<String>>();
        ArrayList<String> columnList = new ArrayList<String>();

        try
        {
            OPCPackage pkg= OPCPackage.open(path);
            XSSFWorkbook excel=new XSSFWorkbook(pkg);
            //获取sheet
            XSSFSheet sheet=excel.getSheet(sheetName);

            // 加载列名
            XSSFRow columnRow = sheet.getRow(3);
            if(columnRow == null) {
                throw new RuntimeException("数据文件读取错误!请确保第4行为英文列名!");
            }
            else {
                Iterator<Cell> colIter = columnRow.iterator();
                // 迭代所有列
                while(colIter.hasNext()) {
                    Cell cell = colIter.next();
                    String colName = cell.getStringCellValue();
                    columnList.add(colName);
                    columnNum++;
                }
            }

            System.out.println("读取到:" + columnNum + "列");
            System.out.println(Arrays.toString(columnList.toArray()));

            // 初始化resultMap
            for(String colName : columnList) {
                resultMap.put(colName, new ArrayList<String>());
            }

            // 迭代sheet
            Iterator<Row> iter = sheet.iterator();
            int i = 0;
            int rownum = 1;

            while(iter.hasNext()) {
                Row row = iter.next();
                Iterator<Cell> cellIter = row.cellIterator();

                // 跳过前4行
                if(rownum <= 4) {
                    ++rownum;
                    continue;
                }

                while(cellIter.hasNext()) {
                    XSSFCell cell=(XSSFCell) cellIter.next();
                    //根据单元的的类型,读取相应的结果
                    if(cell.getCellType() == CellType.NUMERIC) {
                        resultMap.get(columnList.get(i % columnList.size())).add(Double.toString(cell.getNumericCellValue()));
                    }
                    else {
                        resultMap.get(columnList.get(i % columnList.size())).add(cell.getStringCellValue());
                    }

                    ++i;
                    ++rownum;
                }
            }
        }
        catch (Exception e) {
            e.printStackTrace();
        }

        return resultMap;
    }
}

样例数据

4.4编码

创建实体类

package cn.btks.momo_chat.entity;

import lombok.Data;

@Data
public class Msg {
    private String msgTime;//消息时间
    private String senderNickyname;//发件人昵称
    private String senderAccount;//发件人账号
    private String senderSex;//发件人性别
    private String senderIp;//发件人IP
    private String senderOs;//发件人系统
    private String senderPhoneType;//发件人手机型号
    private String senderNetwork;//发件人网络制式
    private String senderGps;//发件人GPS
    private String receiverNickyname;//收件人昵称
    private String receiverIp;//收件人IP
    private String receiverAccount;//收件人账号
    private String receiverOs;//收件人系统
    private String receiverPhoneType;//收件人手机型号
    private String receiverNetwork;//收件人网络制式
    private String receiverGps;//收件人GPS
    private String receiverSex;//收件人性别
    private String msgType;//消息类型
    private String distance;//双方距离
    private String message;//消息
}

随机生成一条数据(注:需要添加hutool工具)

package cn.btks.momo_chat.tool;

import cn.btks.momo_chat.entity.Msg;
import cn.hutool.core.date.DateUtil;

import java.util.Date;
import java.util.List;
import java.util.Map;

/**
 * 陌陌消息随机生成器
 * 1、通过ExcelReader工具类随机从excel读取数据,并生成一个Msg对象
 * 2、设计RowKey,避免热点问题,尽量让每条数据均匀的分布到每个region(已经创建了6个region)
 * 3、将Msg对象put到Hbase中
 * 4、生成10w条测试数据
 */
public class MoMoMsgGen {
    public static void main(String[] args) {
        //读取excel文件中的数据
        String xlxsPath = "E:\\002学习\\大数据\\HBase\\4.资料\\陌陌海量消息存储案例\\测试数据集.xlsx";
        Map<String, List<String>> resultMap = ExcelReader.readXlsx(xlxsPath, "陌陌数据");
        System.out.println(getOneMessage(resultMap));

    }

    /**
     * 基于从excel表格中读取的的数据所及生成一个msg对象
     * @param msgList 读取到的excel数据
     * @return 一条消息,一个msg对象
     */
    public static Msg getOneMessage(Map<String, List<String>> resultMap){
        //1.    构建Msg实体类对象
        Msg msg = new Msg();
 //3.    注意时间使用系统当前时间
        //将当前系统时间设置为消息时间,以年月日 时分秒形式存储yyyy-MM-dd HH:mm:ss
        //当前时间
        Date date = DateUtil.date();
        msg.setMsgTime(DateUtil.format(date, "yyyy-MM-dd HH:mm:ss"));
        //2.    调用ExcelReader中的randomColumn随机生成一个列的数据
        //初始化sender_nickyname
        msg.setSenderNickyname(ExcelReader.randomColumn(resultMap,"sender_nickyname"));
        msg.setSenderAccount(ExcelReader.randomColumn(resultMap,"sender_account"));
        msg.setSenderSex(ExcelReader.randomColumn(resultMap,"sender_sex"));
        msg.setSenderIp(ExcelReader.randomColumn(resultMap,"sender_ip"));
        msg.setSenderOs(ExcelReader.randomColumn(resultMap,"sender_os"));
        msg.setSenderPhoneType(ExcelReader.randomColumn(resultMap,"sender_phone_type"));
        msg.setSenderNetwork(ExcelReader.randomColumn(resultMap,"sender_network"));
        msg.setSenderGps(ExcelReader.randomColumn(resultMap,"sender_gps"));
        msg.setReceiverNickyname(ExcelReader.randomColumn(resultMap,"receiver_nickyname"));
        msg.setReceiverIp(ExcelReader.randomColumn(resultMap,"receiver_ip"));
        msg.setReceiverAccount(ExcelReader.randomColumn(resultMap,"receiver_account"));
        msg.setReceiverOs(ExcelReader.randomColumn(resultMap,"receiver_os"));
        msg.setReceiverPhoneType(ExcelReader.randomColumn(resultMap,"receiver_phone_type"));
        msg.setReceiverNetwork(ExcelReader.randomColumn(resultMap,"receiver_network"));
        msg.setReceiverGps(ExcelReader.randomColumn(resultMap,"receiver_gps"));
        msg.setReceiverSex(ExcelReader.randomColumn(resultMap,"receiver_sex"));
        msg.setMsgType(ExcelReader.randomColumn(resultMap,"msg_type"));
        msg.setDistance(ExcelReader.randomColumn(resultMap,"distance"));
        msg.setMessage(ExcelReader.randomColumn(resultMap,"message"));
 
        return msg;
    }
}

构建rowkey

rowkey = MD5Hash_发件人账号_收件人账号_消息时间戳

MD5Hash的计算方式:发件人账号_收件人账号_消息时间戳

MD5Hash.getMD5AsHex生成MD5

取MD5前八位

然后拼接

/**
     * 创建getRowkey方法,接收Msg实体对象,并根据该实体对象生成byte[]的rowkey
     * rowkey = MD5Hash_发件人账号_收件人账号_消息时间戳
     * @param msg 消息对象
     * @return
     */
    public static byte[] getRowKey(Msg msg){
        //1.    使用StringBuilder将发件人账号、收件人账号、消息时间戳使用下划线(_)拼接起来
        StringBuilder  builder = new StringBuilder();
        String msgTime = msg.getMsgTime();
        long time = DateUtil.parse(msgTime, "yyyy-MM-dd HH:mm:ss").getTime();
        builder.append(msg.getSenderAccount()).
                append("_").
                append(msg.getReceiverAccount()).
                append("_").
                append(time);
        //2.    使用Bytes.toBytes将拼接出来的字符串转换为byte[]数组

        //3.    使用MD5Hash.getMD5AsHex生成MD5值,并取其前8位
        String md5AsHex = MD5Hash.getMD5AsHex(builder.toString().getBytes());
        String md5AsHex8Bit = md5AsHex.substring(0, 8);
        //4.    再将MD5值和之前拼接好的发件人账号、收件人账号、消息时间戳,再使用下划线拼接,转换为Bytes数组
        String rowKeyString = md5AsHex8Bit+"_"+builder.toString();
        System.out.println(rowKeyString);
        return Bytes.toBytes(rowKeyString);

    }

数据写入Hbase

 //1.    获取Hbase连接
        Configuration config = HBaseConfiguration.create();
        Connection connection = ConnectionFactory.createConnection(config);
        //2.    获取HBase表MOMO_CHAT:MSG
        Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
        int i = 0;
        int MAX = 100000;
        while (i<MAX){
            Msg msg = getOneMessage(resultMap);
            i++;
            //3.    初始化操作Hbase所需的变量(列蔟、列名)
            byte[] rowKey = getRowKey(msg);
            String cf = "C1";
            String colMsgTime = "msg_time";
            String colSenderNickyname = "sender_nickyname";
            String colSenderAccount = "sender_account";
            String colSenderSex = "sender_sex";
            String colSenderIp = "sender_ip";
            String colSenderOs = "sender_os";
            String colSenderPhoneType = "sender_phone_type";
            String colSenderNetwork = "sender_network";
            String colSenderGps = "sender_gps";
            String colReceiverNickyname = "receiver_nickyname";
            String colReceiverIp = "receiver_ip";
            String colReceiverAccount = "receiver_account";
            String colReceiverOs = "receiver_os";
            String colReceiverPhoneType = "receiver_phone_type";
            String colReceiverNetwork = "receiver_network";
            String colReceiverGps = "receiver_gps";
            String colReceiverSex = "receiver_sex";
            String colMsgType = "msg_type";
            String colDistance = "distance";
            String colMessage = "message";
            //4.    构建put请求
            Put put = new Put(rowKey);
            //5.    挨个添加陌陌消息的所有列
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colMsgTime),Bytes.toBytes(msg.getMsgTime()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderNickyname),Bytes.toBytes(msg.getSenderNickyname()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderAccount),Bytes.toBytes(msg.getSenderAccount()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderSex),Bytes.toBytes(msg.getSenderSex()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderIp),Bytes.toBytes(msg.getSenderIp()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderOs),Bytes.toBytes(msg.getSenderOs()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderPhoneType),Bytes.toBytes(msg.getSenderPhoneType()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderNetwork),Bytes.toBytes(msg.getSenderNetwork()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderGps),Bytes.toBytes(msg.getSenderGps()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverNickyname),Bytes.toBytes(msg.getReceiverNickyname()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverIp),Bytes.toBytes(msg.getReceiverIp()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverAccount),Bytes.toBytes(msg.getReceiverAccount()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverOs),Bytes.toBytes(msg.getReceiverOs()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverPhoneType),Bytes.toBytes(msg.getReceiverPhoneType()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverNetwork),Bytes.toBytes(msg.getReceiverNetwork()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverGps),Bytes.toBytes(msg.getReceiverGps()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverSex),Bytes.toBytes(msg.getReceiverSex()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colMsgType),Bytes.toBytes(msg.getMsgType()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colDistance),Bytes.toBytes(msg.getDistance()));
            put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colMessage),Bytes.toBytes(msg.getMessage()));
            //6.    发起put请求
            table.put(put);
            //显示以下进度
            System.out.println(i+"/"+MAX);
        }

        table.close();
        connection.close();

查询

根据以下三个条件查询

  • 日期

  • 发件人

  • 收件人

创建接口

package cn.btks.momo_chat.service;

import cn.btks.momo_chat.entity.Msg;

import java.io.IOException;
import java.util.List;

/**
 * 聊天消息数据服务
 */
public interface ChatMessageService {
    /**
     * 根据日期,发件人,收件人查询
     * @param date 日期
     * @param sender 发件人
     * @param receiver 收件人
     * @return 消息
     * @throws Exception
     */
    List<Msg> getMessage(String date, String sender, String receiver) throws Exception;
    void close() throws IOException;

}

创建实现类

package cn.btks.momo_chat.service.impl;

import cn.btks.momo_chat.entity.Msg;
import cn.btks.momo_chat.service.ChatMessageService;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CompareOperator;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * 使用Hbase的原生方法实现查询
 */

public class HBaseNativeChatMessageService  implements ChatMessageService {

    private Connection connection;
    private String timeFormatter;

    public HBaseNativeChatMessageService() throws IOException {
        Configuration configuration = HBaseConfiguration.create();
        connection = ConnectionFactory.createConnection(configuration);
        timeFormatter = "yyyy-MM-dd HH:mm:ss";

    }
    @Override
    public List<Msg> getMessage(String date, String sender, String receiver) throws Exception {
        //1.    构建scan对象
        Scan scan = new Scan();
        //构建两个带时分秒的日期字符串
        String startDate = date+" 00:00:00";
        String endDate = date+" 23:59:59";
        //2.    构建用于查询时间的范围,例如:2020-10-05 00:00:00 – 2020-10-05 23:59:59

        //3.    构建查询日期的两个Filter,大于等于、小于等于,此处过滤单个列使用SingleColumnValueFilter即可。
        SingleColumnValueFilter startDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
                Bytes.toBytes("msg_time"),
                CompareOperator.GREATER_OR_EQUAL,
                new BinaryComparator(Bytes.toBytes(startDate)));
        SingleColumnValueFilter endDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
                Bytes.toBytes("msg_time"),
                CompareOperator.LESS_OR_EQUAL,
                new BinaryComparator(Bytes.toBytes(endDate)));
        //4.    构建发件人Filter
        SingleColumnValueFilter senderFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
                Bytes.toBytes("sender_account"),
                CompareOperator.EQUAL,
                new BinaryComparator(Bytes.toBytes(sender)));
        //5.    构建收件人Filter
        SingleColumnValueFilter receiverFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
                Bytes.toBytes("receiver_account"),
                CompareOperator.EQUAL,
                new BinaryComparator(Bytes.toBytes(receiver)));
        //6.    使用FilterList组合所有Filter
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL,
                startDateFilter,
                endDateFilter,
                senderFilter,
                receiverFilter);
        //7.    设置scan对象filter
        scan.setFilter(filterList);
        //8.    获取HTable对象,并调用getScanner执行
        Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
        ResultScanner resultScanner = table.getScanner(scan);
        //9.    获取迭代器,迭代每一行,同时迭代每一个单元格
        Iterator<Result> iterator = resultScanner.iterator();
        List<Msg>  msgList= new ArrayList();
        while(iterator.hasNext()){
            //每一行查询的数据都是一个msg对象
            Result result = iterator.next();
            Msg msg = new Msg();
            String rowkey = Bytes.toString(result.getRow());
            //单元格列表
            List<Cell> cells = result.listCells();
            for (Cell cell : cells) {
                //根据当前单元格列名判断,设置对应字段
                String columnName = Bytes.toString(cell.getQualifierArray(),cell.getQualifierOffset(),cell.getQualifierLength());
                if (columnName.equals("msg_time")){
                    msg.setMsgTime(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_nickyname")){
                    msg.setMsgTime(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_account")){
                    msg.setSenderNickyname(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_sex")){
                    msg.setSenderAccount(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_ip")){
                    msg.setSenderSex(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_os")){
                    msg.setSenderIp(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_phone_type")){
                    msg.setSenderOs(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_network")){
                    msg.setSenderPhoneType(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("sender_gps")){
                    msg.setSenderNetwork(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_nickyname")){
                    msg.setSenderGps(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_ip")){
                    msg.setReceiverNickyname(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_account")){
                    msg.setReceiverIp(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_os")){
                    msg.setReceiverAccount(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_phone_type")){
                    msg.setReceiverOs(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_network")){
                    msg.setReceiverPhoneType(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_gps")){
                    msg.setReceiverNetwork(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("receiver_sex")){
                    msg.setReceiverGps(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("msg_type")){
                    msg.setReceiverSex(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("distance")){
                    msg.setMsgType(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }
                if(columnName.equals("message")){
                    msg.setDistance(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
                }

            }
            msgList.add(msg);
        }

        resultScanner.close();
        table.close();
        return msgList;
    }

    @Override
    public void close() throws IOException {
        connection.close();
    }
}

测试

package cn.btks.momo_chat.service;

import cn.btks.momo_chat.entity.Msg;
import cn.btks.momo_chat.service.impl.HBaseNativeChatMessageService;
import org.junit.Test;

import java.io.IOException;
import java.util.List;

public class ChatMessageServiceTest {
    private ChatMessageService chatMessageService;
    public ChatMessageServiceTest() throws IOException {
        chatMessageService = new HBaseNativeChatMessageService();
    }
    @Test
    public  void getMessage() throws Exception {
        List<Msg> message = chatMessageService.getMessage("2023-03-20", "13029397618", "18874086861");
        for (Msg msg : message) {

            System.out.println(msg);
        }

    }
}

结果:

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

天码村

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值