1陌陌消息存储案例基本信息
1.1案例介绍
陌陌有非常多的用户消息数据,这些消息都需要大量保存,读取会少,写比较多,都比较少。要存储这样海量的数据,Hbase就非常合适,Hbase适合存储写多读少的数据。
通过本案例学习1、Hbase的表设计,Hbase预分区,Rowkey设计;2、Hbase调优;3、Apache Phoenix;4、分页查询;5数据接口开发
1.2、数据结构
字段名 | 说明 |
msg_time | 消息时间 |
sender_nickyname | 发件人昵称 |
sender_account | 发件人账号 |
sender_sex | 发件人性别 |
sender_ip | 发件人IP |
sender_os | 发件人系统 |
sender_phone_type | 发件人手机型号 |
sender_network | 发件人网络制式 |
sender_gps | 发件人GPS |
receiver_nickyname | 收件人昵称 |
receiver_ip | 收件人IP |
receiver_account | 收件人账号 |
receiver_os | 收件人系统 |
receiver_phone_type | 收件人手机型号 |
receiver_network | 收件人网络制式 |
receiver_gps | 收件人GPS |
receiver_sex | 收件人性别 |
msg_type | 消息类型 |
distance | 双方距离 |
message | 消息 |
2 准备工作
2.1创建IDEA Maven项目
项目目录结构
2.2创建脚本文件
在上面的hbase_shell文件夹下创建脚本文件,命名为create_ns_table.rb
3 表结构设计
3.1 命名空间(namespace)
说明:保存很多表,根据业务划分命名空间,便于管理。类似于Hive中的数据库,不同数据库下面可以放不同类型的表。Hbase默认名称空间是default。Hbase还有一个命名空间【hbase】用于存放系统内建表【namespace、meta】
创建、查看、删除命名空间
语法:
## 创建命名空间
create_namespace 'MOMO_CHAT'
#查看所有命名空间
list_namespace
##删除之前的命名空间
drop_namespace 'MOMO_CHAT'
## 查看指定命名空间
describe_namespace 'MOMO_CHAT'
命名空间创建表
create 'MOMO_CHAT:MSG','C1'
已经包含表的命名空间不能删除
3.2列簇设计
Hbase中列簇越少越好,两个及以上列簇Hbase性能不是很好,一个列簇对应一个store存储,也对应一个memstore,到达设置的阈值就会flush,多个列簇就会同时flush,带来不必要的IO开销,本案例中设计一个列簇:C1
3.3版本设计
需要保存的历史聊天记录是不会更新的,一旦数据保存到Hbase不会更新,无需考虑版本问题
本项目中保留一个版本即可,这样可以节省口空间
Hbase默认创建表的版本为1,保持默认
3.4查看表
hbase(main):013:0> describe 'MOMO_CHAT:MSG'
Table MOMO_CHAT:MSG is ENABLED
MOMO_CHAT:MSG
COLUMN FAMILIES DESCRIPTION
{NAME => 'C1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DEL
ETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN
_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMOR
Y => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',
BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
1 row(s)
3.5数据压缩
Hbase可以多种压缩编码,包括如下
压缩算法 | 压缩后占比 | 压缩 | 解压缩 |
GZIP | 13.4% | 21 MB/s | 118 MB/s |
LZO | 20.5% | 135 MB/s | 410 MB/s |
Zippy/Snappy | 22.2% | 172 MB/s | 409 MB/s |
只在硬盘压缩,内存中或者网络传输中没有压缩。本案例采用GZ
上一小节中的COMPRESSION => 'NONE',表示没有使用压缩
设置数据压缩
# 创建新表的数据压缩
create 'MOMO_CHAT:MSG',{NAME=>'C1',COMPRESSION=>'GZ'}
# 修改表的压缩方式
alter 'MOMO_CHAT:MSG',{NAME=>'C1',COMPRESSION=>'GZ'}
如果表里有数据alter要谨慎执行
3.6ROWKEY设计原则
3.6.1HBase官方的设计原则
避免使用递增行键和时许数据:如果是递增的,会有很多数据写入时,写入都在一台机器上,尽量应当写入压力均衡到各个RegionServer。
避免ROWKEY和列的长度过大:一个rowkey会存储多遍,如下所示,rowkey太长,导致存储冗余过多
列簇和列名也是存储的,所以列的长度不要太大。
使用long类型的比String类型更节省空间,long类型8个字节,字符串使用3倍的字节存储
ROWKEY的唯一性,设计ROWKEY时,必须保证唯一性,重复了会覆盖以前的数据
3.6.2 避免数据热点
大量的客户端访问集群的一个点或几个点(可能是读或者写)、大量的访问可能会使得某个服务器超出承受能力,导致性能下降。
预分区:默认一个Hbase表只有一个region。
每个region都有两个重要属性startKey、endKey,默认是空的,没有边界,都存储在一起。region中rowkey按照字典序存储。但是当数据越来越大时,region将会分裂,读取一个中间值,进行分裂。
预分区个数,等于节点倍数。默认region大小10G。
ROWKEY避免热点:避免数据写道同一个节点中去,按照rowkey顺序存储,提供以下策略
反转策略:例如时间戳或电话号码,前几位相同,后几位不同,可以考虑反转落到region,反转可以随机分布,但是牺牲了rowkey的有序性,不利于scan,scan是顺序扫描。
加盐策略:在rowkey前面加一些固定长度的随机数,随机数保障负载均衡。问题:查询的时候无法知道随机数是什么,只能scan。
哈希策略:基于rowkey取hash,将hash值作为前缀,同一个rowkey的hash值是一样的,hash的方法有MD5,sha1、sha256或sha512,缺点hashing也不利于scan
3.6.3 陌陌打招呼数据预分区
预分区实践案例
方法1:指定startKey、endKey,
create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
方法2:指定分区数量和分区策略
#分区个数15个,
#rowkey是十六进制(HexStringSplit)字符串作为前缀,
#rowkey是(DecimalStringSplit)10进制数字字符串作为前缀的
#rowkey是(UniformSplit)是前缀完全随机
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
region数量根据数量预估,受限于三个虚拟机,设计6个分区
rokey设计
为保证数据均匀分布,使用MD5Hash作为前缀。
ROWKEY=MD5Hash_发件人账号_收件人账号_时间戳
业务分区脚本
##删除之前创建的表
disable 'MOMO_CHAT:MSG'
drop 'MOMO_CHAT:MSG'
##创建表(指定预分区)
create 'MOMO_CHAT:MSG', {NAME => "C1", COMPRESSION => "GZ"}, { NUMREGIONS => 6, SPLITALGO => 'HexStringSplit'}
HDFS中的体现
4项目
4.1 项目初始化
导入依赖
<repositories><!-- 代码库 -->
<repository>
<id>aliyun</id>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
<updatePolicy>never</updatePolicy>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- HBase客户端 -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.1.0</version>
</dependency>
<!-- Xml操作相关 -->
<dependency>
<groupId>com.github.cloudecho</groupId>
<artifactId>xmlbean</artifactId>
<version>1.5.5</version>
</dependency>
<!-- 操作Office库 -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.0.1</version>
</dependency>
<!-- 操作Office库 -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.0.1</version>
</dependency>
<!-- 操作Office库 -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.1</version>
</dependency>
<!-- 操作JSON -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.62</version>
</dependency>
<!-- phoenix core -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>5.0.0-HBase-2.0</version>
</dependency>
<!-- phoenix 客户端 -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-queryserver-client</artifactId>
<version>5.0.0-HBase-2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<target>1.8</target>
<source>1.8</source>
</configuration>
</plugin>
</plugins>
</build>
4.2规划包
cn.btks.momo_chat.service | 用于存放数据服务接口相关代码,例如:查询的API代码 |
cn.btks.momo_chat.service.impl | 用于存放数据服务接口实现类相关代码,例如:查询的API代码 |
cn.btks.momo_chat.tool | 工具类 |
cn.btks.momo_chat.entity | 存放实体类 |
4.3导入excel工具类
package cn.btks.momo_chat.tool;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.util.*;
import java.util.logging.Logger;
public class ExcelReader {
private static Logger log = Logger.getLogger("client");
public static void main(String[] args) {
String xlxsPath = "E:\\002学习\\大数据\\HBase\\4.资料\\陌陌海量消息存储案例\\测试数据集.xlsx";
Map<String, List<String>> mapData = readXlsx(xlxsPath, "陌陌数据");
for(int i = 0; i < 10000; ++i) {
System.out.println(randomColumn(mapData, "sender_nickyname"));
}
}
/**
* 随机获取某一列的数据
* @param columnName 列名
* @return 随机数据
*/
public static String randomColumn(Map<String, List<String>> resultMap, String columnName) {
List<String> valList = resultMap.get(columnName);
if(valList == null) throw new RuntimeException("未读取到列名为" + columnName + "的任何数据!");
Random random = new Random();
int randomIndex = random.nextInt(valList.size());
return valList.get(randomIndex);
}
/**
* 将Excel文件读取为Map结构: <column_name, list>
* 其中column_name为第4行的名字
* @param path Excel文件路径(要求Excel为2007)
* @param sheetName 工作簿名称
* @return Map结构
*/
public static Map<String, List<String>> readXlsx(String path, String sheetName)
{
// 列的数量
int columnNum = 0;
HashMap<String, List<String>> resultMap = new HashMap<String, List<String>>();
ArrayList<String> columnList = new ArrayList<String>();
try
{
OPCPackage pkg= OPCPackage.open(path);
XSSFWorkbook excel=new XSSFWorkbook(pkg);
//获取sheet
XSSFSheet sheet=excel.getSheet(sheetName);
// 加载列名
XSSFRow columnRow = sheet.getRow(3);
if(columnRow == null) {
throw new RuntimeException("数据文件读取错误!请确保第4行为英文列名!");
}
else {
Iterator<Cell> colIter = columnRow.iterator();
// 迭代所有列
while(colIter.hasNext()) {
Cell cell = colIter.next();
String colName = cell.getStringCellValue();
columnList.add(colName);
columnNum++;
}
}
System.out.println("读取到:" + columnNum + "列");
System.out.println(Arrays.toString(columnList.toArray()));
// 初始化resultMap
for(String colName : columnList) {
resultMap.put(colName, new ArrayList<String>());
}
// 迭代sheet
Iterator<Row> iter = sheet.iterator();
int i = 0;
int rownum = 1;
while(iter.hasNext()) {
Row row = iter.next();
Iterator<Cell> cellIter = row.cellIterator();
// 跳过前4行
if(rownum <= 4) {
++rownum;
continue;
}
while(cellIter.hasNext()) {
XSSFCell cell=(XSSFCell) cellIter.next();
//根据单元的的类型,读取相应的结果
if(cell.getCellType() == CellType.NUMERIC) {
resultMap.get(columnList.get(i % columnList.size())).add(Double.toString(cell.getNumericCellValue()));
}
else {
resultMap.get(columnList.get(i % columnList.size())).add(cell.getStringCellValue());
}
++i;
++rownum;
}
}
}
catch (Exception e) {
e.printStackTrace();
}
return resultMap;
}
}
样例数据
4.4编码
创建实体类
package cn.btks.momo_chat.entity;
import lombok.Data;
@Data
public class Msg {
private String msgTime;//消息时间
private String senderNickyname;//发件人昵称
private String senderAccount;//发件人账号
private String senderSex;//发件人性别
private String senderIp;//发件人IP
private String senderOs;//发件人系统
private String senderPhoneType;//发件人手机型号
private String senderNetwork;//发件人网络制式
private String senderGps;//发件人GPS
private String receiverNickyname;//收件人昵称
private String receiverIp;//收件人IP
private String receiverAccount;//收件人账号
private String receiverOs;//收件人系统
private String receiverPhoneType;//收件人手机型号
private String receiverNetwork;//收件人网络制式
private String receiverGps;//收件人GPS
private String receiverSex;//收件人性别
private String msgType;//消息类型
private String distance;//双方距离
private String message;//消息
}
随机生成一条数据(注:需要添加hutool工具)
package cn.btks.momo_chat.tool;
import cn.btks.momo_chat.entity.Msg;
import cn.hutool.core.date.DateUtil;
import java.util.Date;
import java.util.List;
import java.util.Map;
/**
* 陌陌消息随机生成器
* 1、通过ExcelReader工具类随机从excel读取数据,并生成一个Msg对象
* 2、设计RowKey,避免热点问题,尽量让每条数据均匀的分布到每个region(已经创建了6个region)
* 3、将Msg对象put到Hbase中
* 4、生成10w条测试数据
*/
public class MoMoMsgGen {
public static void main(String[] args) {
//读取excel文件中的数据
String xlxsPath = "E:\\002学习\\大数据\\HBase\\4.资料\\陌陌海量消息存储案例\\测试数据集.xlsx";
Map<String, List<String>> resultMap = ExcelReader.readXlsx(xlxsPath, "陌陌数据");
System.out.println(getOneMessage(resultMap));
}
/**
* 基于从excel表格中读取的的数据所及生成一个msg对象
* @param msgList 读取到的excel数据
* @return 一条消息,一个msg对象
*/
public static Msg getOneMessage(Map<String, List<String>> resultMap){
//1. 构建Msg实体类对象
Msg msg = new Msg();
//3. 注意时间使用系统当前时间
//将当前系统时间设置为消息时间,以年月日 时分秒形式存储yyyy-MM-dd HH:mm:ss
//当前时间
Date date = DateUtil.date();
msg.setMsgTime(DateUtil.format(date, "yyyy-MM-dd HH:mm:ss"));
//2. 调用ExcelReader中的randomColumn随机生成一个列的数据
//初始化sender_nickyname
msg.setSenderNickyname(ExcelReader.randomColumn(resultMap,"sender_nickyname"));
msg.setSenderAccount(ExcelReader.randomColumn(resultMap,"sender_account"));
msg.setSenderSex(ExcelReader.randomColumn(resultMap,"sender_sex"));
msg.setSenderIp(ExcelReader.randomColumn(resultMap,"sender_ip"));
msg.setSenderOs(ExcelReader.randomColumn(resultMap,"sender_os"));
msg.setSenderPhoneType(ExcelReader.randomColumn(resultMap,"sender_phone_type"));
msg.setSenderNetwork(ExcelReader.randomColumn(resultMap,"sender_network"));
msg.setSenderGps(ExcelReader.randomColumn(resultMap,"sender_gps"));
msg.setReceiverNickyname(ExcelReader.randomColumn(resultMap,"receiver_nickyname"));
msg.setReceiverIp(ExcelReader.randomColumn(resultMap,"receiver_ip"));
msg.setReceiverAccount(ExcelReader.randomColumn(resultMap,"receiver_account"));
msg.setReceiverOs(ExcelReader.randomColumn(resultMap,"receiver_os"));
msg.setReceiverPhoneType(ExcelReader.randomColumn(resultMap,"receiver_phone_type"));
msg.setReceiverNetwork(ExcelReader.randomColumn(resultMap,"receiver_network"));
msg.setReceiverGps(ExcelReader.randomColumn(resultMap,"receiver_gps"));
msg.setReceiverSex(ExcelReader.randomColumn(resultMap,"receiver_sex"));
msg.setMsgType(ExcelReader.randomColumn(resultMap,"msg_type"));
msg.setDistance(ExcelReader.randomColumn(resultMap,"distance"));
msg.setMessage(ExcelReader.randomColumn(resultMap,"message"));
return msg;
}
}
构建rowkey
rowkey = MD5Hash_发件人账号_收件人账号_消息时间戳
MD5Hash的计算方式:发件人账号_收件人账号_消息时间戳
MD5Hash.getMD5AsHex生成MD5
取MD5前八位
然后拼接
/**
* 创建getRowkey方法,接收Msg实体对象,并根据该实体对象生成byte[]的rowkey
* rowkey = MD5Hash_发件人账号_收件人账号_消息时间戳
* @param msg 消息对象
* @return
*/
public static byte[] getRowKey(Msg msg){
//1. 使用StringBuilder将发件人账号、收件人账号、消息时间戳使用下划线(_)拼接起来
StringBuilder builder = new StringBuilder();
String msgTime = msg.getMsgTime();
long time = DateUtil.parse(msgTime, "yyyy-MM-dd HH:mm:ss").getTime();
builder.append(msg.getSenderAccount()).
append("_").
append(msg.getReceiverAccount()).
append("_").
append(time);
//2. 使用Bytes.toBytes将拼接出来的字符串转换为byte[]数组
//3. 使用MD5Hash.getMD5AsHex生成MD5值,并取其前8位
String md5AsHex = MD5Hash.getMD5AsHex(builder.toString().getBytes());
String md5AsHex8Bit = md5AsHex.substring(0, 8);
//4. 再将MD5值和之前拼接好的发件人账号、收件人账号、消息时间戳,再使用下划线拼接,转换为Bytes数组
String rowKeyString = md5AsHex8Bit+"_"+builder.toString();
System.out.println(rowKeyString);
return Bytes.toBytes(rowKeyString);
}
数据写入Hbase
//1. 获取Hbase连接
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
//2. 获取HBase表MOMO_CHAT:MSG
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
int i = 0;
int MAX = 100000;
while (i<MAX){
Msg msg = getOneMessage(resultMap);
i++;
//3. 初始化操作Hbase所需的变量(列蔟、列名)
byte[] rowKey = getRowKey(msg);
String cf = "C1";
String colMsgTime = "msg_time";
String colSenderNickyname = "sender_nickyname";
String colSenderAccount = "sender_account";
String colSenderSex = "sender_sex";
String colSenderIp = "sender_ip";
String colSenderOs = "sender_os";
String colSenderPhoneType = "sender_phone_type";
String colSenderNetwork = "sender_network";
String colSenderGps = "sender_gps";
String colReceiverNickyname = "receiver_nickyname";
String colReceiverIp = "receiver_ip";
String colReceiverAccount = "receiver_account";
String colReceiverOs = "receiver_os";
String colReceiverPhoneType = "receiver_phone_type";
String colReceiverNetwork = "receiver_network";
String colReceiverGps = "receiver_gps";
String colReceiverSex = "receiver_sex";
String colMsgType = "msg_type";
String colDistance = "distance";
String colMessage = "message";
//4. 构建put请求
Put put = new Put(rowKey);
//5. 挨个添加陌陌消息的所有列
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colMsgTime),Bytes.toBytes(msg.getMsgTime()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderNickyname),Bytes.toBytes(msg.getSenderNickyname()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderAccount),Bytes.toBytes(msg.getSenderAccount()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderSex),Bytes.toBytes(msg.getSenderSex()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderIp),Bytes.toBytes(msg.getSenderIp()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderOs),Bytes.toBytes(msg.getSenderOs()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderPhoneType),Bytes.toBytes(msg.getSenderPhoneType()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderNetwork),Bytes.toBytes(msg.getSenderNetwork()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colSenderGps),Bytes.toBytes(msg.getSenderGps()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverNickyname),Bytes.toBytes(msg.getReceiverNickyname()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverIp),Bytes.toBytes(msg.getReceiverIp()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverAccount),Bytes.toBytes(msg.getReceiverAccount()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverOs),Bytes.toBytes(msg.getReceiverOs()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverPhoneType),Bytes.toBytes(msg.getReceiverPhoneType()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverNetwork),Bytes.toBytes(msg.getReceiverNetwork()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverGps),Bytes.toBytes(msg.getReceiverGps()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colReceiverSex),Bytes.toBytes(msg.getReceiverSex()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colMsgType),Bytes.toBytes(msg.getMsgType()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colDistance),Bytes.toBytes(msg.getDistance()));
put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(colMessage),Bytes.toBytes(msg.getMessage()));
//6. 发起put请求
table.put(put);
//显示以下进度
System.out.println(i+"/"+MAX);
}
table.close();
connection.close();
查询
根据以下三个条件查询
日期
发件人
收件人
创建接口
package cn.btks.momo_chat.service;
import cn.btks.momo_chat.entity.Msg;
import java.io.IOException;
import java.util.List;
/**
* 聊天消息数据服务
*/
public interface ChatMessageService {
/**
* 根据日期,发件人,收件人查询
* @param date 日期
* @param sender 发件人
* @param receiver 收件人
* @return 消息
* @throws Exception
*/
List<Msg> getMessage(String date, String sender, String receiver) throws Exception;
void close() throws IOException;
}
创建实现类
package cn.btks.momo_chat.service.impl;
import cn.btks.momo_chat.entity.Msg;
import cn.btks.momo_chat.service.ChatMessageService;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CompareOperator;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
/**
* 使用Hbase的原生方法实现查询
*/
public class HBaseNativeChatMessageService implements ChatMessageService {
private Connection connection;
private String timeFormatter;
public HBaseNativeChatMessageService() throws IOException {
Configuration configuration = HBaseConfiguration.create();
connection = ConnectionFactory.createConnection(configuration);
timeFormatter = "yyyy-MM-dd HH:mm:ss";
}
@Override
public List<Msg> getMessage(String date, String sender, String receiver) throws Exception {
//1. 构建scan对象
Scan scan = new Scan();
//构建两个带时分秒的日期字符串
String startDate = date+" 00:00:00";
String endDate = date+" 23:59:59";
//2. 构建用于查询时间的范围,例如:2020-10-05 00:00:00 – 2020-10-05 23:59:59
//3. 构建查询日期的两个Filter,大于等于、小于等于,此处过滤单个列使用SingleColumnValueFilter即可。
SingleColumnValueFilter startDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
Bytes.toBytes("msg_time"),
CompareOperator.GREATER_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(startDate)));
SingleColumnValueFilter endDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
Bytes.toBytes("msg_time"),
CompareOperator.LESS_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(endDate)));
//4. 构建发件人Filter
SingleColumnValueFilter senderFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
Bytes.toBytes("sender_account"),
CompareOperator.EQUAL,
new BinaryComparator(Bytes.toBytes(sender)));
//5. 构建收件人Filter
SingleColumnValueFilter receiverFilter = new SingleColumnValueFilter(Bytes.toBytes("C1"),
Bytes.toBytes("receiver_account"),
CompareOperator.EQUAL,
new BinaryComparator(Bytes.toBytes(receiver)));
//6. 使用FilterList组合所有Filter
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL,
startDateFilter,
endDateFilter,
senderFilter,
receiverFilter);
//7. 设置scan对象filter
scan.setFilter(filterList);
//8. 获取HTable对象,并调用getScanner执行
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
ResultScanner resultScanner = table.getScanner(scan);
//9. 获取迭代器,迭代每一行,同时迭代每一个单元格
Iterator<Result> iterator = resultScanner.iterator();
List<Msg> msgList= new ArrayList();
while(iterator.hasNext()){
//每一行查询的数据都是一个msg对象
Result result = iterator.next();
Msg msg = new Msg();
String rowkey = Bytes.toString(result.getRow());
//单元格列表
List<Cell> cells = result.listCells();
for (Cell cell : cells) {
//根据当前单元格列名判断,设置对应字段
String columnName = Bytes.toString(cell.getQualifierArray(),cell.getQualifierOffset(),cell.getQualifierLength());
if (columnName.equals("msg_time")){
msg.setMsgTime(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_nickyname")){
msg.setMsgTime(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_account")){
msg.setSenderNickyname(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_sex")){
msg.setSenderAccount(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_ip")){
msg.setSenderSex(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_os")){
msg.setSenderIp(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_phone_type")){
msg.setSenderOs(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_network")){
msg.setSenderPhoneType(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("sender_gps")){
msg.setSenderNetwork(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_nickyname")){
msg.setSenderGps(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_ip")){
msg.setReceiverNickyname(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_account")){
msg.setReceiverIp(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_os")){
msg.setReceiverAccount(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_phone_type")){
msg.setReceiverOs(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_network")){
msg.setReceiverPhoneType(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_gps")){
msg.setReceiverNetwork(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("receiver_sex")){
msg.setReceiverGps(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("msg_type")){
msg.setReceiverSex(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("distance")){
msg.setMsgType(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
if(columnName.equals("message")){
msg.setDistance(Bytes.toString(cell.getValueArray(),cell.getValueOffset(),cell.getValueLength()));
}
}
msgList.add(msg);
}
resultScanner.close();
table.close();
return msgList;
}
@Override
public void close() throws IOException {
connection.close();
}
}
测试
package cn.btks.momo_chat.service;
import cn.btks.momo_chat.entity.Msg;
import cn.btks.momo_chat.service.impl.HBaseNativeChatMessageService;
import org.junit.Test;
import java.io.IOException;
import java.util.List;
public class ChatMessageServiceTest {
private ChatMessageService chatMessageService;
public ChatMessageServiceTest() throws IOException {
chatMessageService = new HBaseNativeChatMessageService();
}
@Test
public void getMessage() throws Exception {
List<Msg> message = chatMessageService.getMessage("2023-03-20", "13029397618", "18874086861");
for (Msg msg : message) {
System.out.println(msg);
}
}
}
结果: