Hbase之陌陌海量存储案例

Hbase之陌陌海量存储案例

1、案例介绍

在陌陌中,每天都有数千万的用户聊天消息需要存储。而且,这些消息都是需要进行大量地保存,而读取会少很多。想想:我们在使用微信的时候,大多数时候,我们都是在发消息,而不是每时每刻查询历史消息。要存储这样海量的数据,HBase就非常适合了,HBase本身也非常适合存储这种写多读少的应用场景。本案例,将结合陌陌聊天业务背景,以HBase来存储海量的数据。

2、打招呼消息数据集介绍

字段名说明
msg_time消息时间
sender_nickyname发件人昵称
sender_account发件人账号
sender_sex发件人性别
sender_ip发件人IP
sender_os发件人系统
sender_phone_type发件人手机型号
sender_network发件人网络制式
sender_gps发件人GPS
receiver_nickyname收件人昵称
receiver_ip收件人IP
receiver_account收件人账号
receiver_os收件人系统
receiver_phone_type收件人手机型号
receiver_network收件人网络制式
receiver_gps收件人GPS
receiver_sex收件人性别
msg_type消息类型
distance双方距离
message消息

3、准备工作

3.1 、创建IDEA Maven项目

groupIdcn.kefu
artifactIdmomo_chat_app

3.2、建表脚本

关于建表的设计可以参考:https://blog.csdn.net/qq_32677137/article/details/108535262

#创建名称空间
create_namespace 'MOMO_CHAT';
#创建hbase表,列簇为C1,压缩格式为GZ,分区数量为6
create 'MOMO_CHAT:MSG', {NAME => "C1", COMPRESSION => "GZ"}, { NUMREGIONS => 6, SPLITALGO => 'HexStringSplit'}

3.3、导入POM依赖

<repositories><!-- 代码库 -->
    <repository>
        <id>aliyun</id>
        <url>http://maven.aliyun.cn/nexus/content/groups/public/</url>
        <releases>
            <enabled>true</enabled>
        </releases>
        <snapshots>
            <enabled>false</enabled>
            <updatePolicy>never</updatePolicy>
        </snapshots>
    </repository>
</repositories>

<dependencies>
    <!-- HBase客户端 -->
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>2.1.0</version>
    </dependency>
    <!-- Xml操作相关 -->
    <dependency>
        <groupId>com.github.cloudecho</groupId>
        <artifactId>xmlbean</artifactId>
        <version>1.5.5</version>
    </dependency>
    <!-- 操作Office库 -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>4.0.1</version>
    </dependency>
    <!-- 操作Office库 -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>4.0.1</version>
    </dependency>
    <!-- 操作Office库 -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml-schemas</artifactId>
        <version>4.0.1</version>
    </dependency>
    <!-- 操作JSON -->
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.62</version>
    </dependency>
    <!-- phoenix core -->
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-core</artifactId>
        <version>5.0.0-HBase-2.0</version>
    </dependency>
    <!-- phoenix 客户端 -->
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-queryserver-client</artifactId>
        <version>5.0.0-HBase-2.0</version>
    </dependency>
</dependencies>


<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <target>1.8</target>
                <source>1.8</source>
            </configuration>
        </plugin>
    </plugins>
</build>

3.4、在resource路径下建立三个配置文件

  • core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 用于设置Hadoop的文件系统,由URI指定 -->
 <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1.itcast.cn:8020</value>
 </property>
<!-- 配置Hadoop存储数据目录,默认/tmp/hadoop-${user.name} -->
 <property>
       <name>hadoop.tmp.dir</name>
       <value>/export/servers/hadoop-2.7.5/hadoopDatas/tempDatas</value>
</property>

<!--  缓冲区大小,实际工作中根据服务器性能动态调整 -->
 <property>
       <name>io.file.buffer.size</name>
       <value>4096</value>
 </property>

<!--  开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟 -->
 <property>
       <name>fs.trash.interval</name>
       <value>10080</value>
 </property>
<property>
	<name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
</configuration>

  • hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
	<!-- HBase数据在HDFS中的存放的路径 -->
	<property>
	    <name>hbase.rootdir</name>
	    <value>hdfs://node1.itcast.cn:8020/hbase</value>
	</property>
	<!-- Hbase的运行模式。false是单机模式,true是分布式模式。若为false,Hbase和Zookeeper会运行在同一个JVM里面 -->
	<property>
	    <name>hbase.cluster.distributed</name>
	    <value>true</value>
	</property>
	<!-- ZooKeeper的地址 -->
	<property>
	    <name>hbase.zookeeper.quorum</name>
	    <value>node1.itcast.cn,node2.itcast.cn,node3.itcast.cn</value>
	</property>
	<!-- ZooKeeper快照的存储位置 -->
	<property>
	    <name>hbase.zookeeper.property.dataDir</name>
	    <value>/export/servers/zookeeper-3.4.6/data</value>
	</property>
	<!--  V2.1版本,在分布式情况下, 设置为false -->
	<property>
	    <name>hbase.unsafe.stream.capability.enforce</name>
	    <value>false</value>
	</property>
        <property>
            <name>phoenix.schema.isNamespaceMappingEnabled</name>
            <value>true</value>
        </property>

</configuration>
  • log4j.properties
log4j.rootLogger=INFO,stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n

3.4、创建包结构

cn.kefu.momo_chat.service用于存放数据服务接口相关代码,例如:查询的API代码
cn.kefu.momo_chat.service.impl用于存放数据服务接口实现类相关代码,例如:查询的API代码
cn.kefu.momo_chat.tool工具类
cn.kefu.momo_chat.entity存放实体类

3.5、导入ExcelReader工具类

在资料包中有一个ExcelReader.java文件,ExcelReader工具类可以读取Excel中的数据称为HashMap这样,方便我们快速生成数据。

ExcelReader工具类主要有两个方法:

  1. readXlsx——用于将指定路径的Excel文件中的工作簿读取为Map结构

  2. randomColumn——随机生成某一列中的数据。

将ExcelReader添加到cn.kefu.momo_chat.tool包中。

package cn.kefu.momo_chat.tool;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.util.*;
import java.util.logging.Logger;

public class ExcelReader {
    private static Logger log = Logger.getLogger("client");

    public static void main(String[] args) {
        String xlxsPath = "D:\\workspace\\idea-workspace\\momo_chat_app\\data\\测试数据集.xlsx";
        Map<String, List<String>> mapData = readXlsx(xlxsPath, "陌陌数据");

        for(int i = 0; i < 10000; ++i) {
            System.out.println(randomColumn(mapData, "sender_nickyname"));
        }
    }

    /**
     * 随机获取某一列的数据
     * @param columnName 列名
     * @return 随机数据
     */
    public static String randomColumn(Map<String, List<String>> resultMap, String columnName) {
        List<String> valList = resultMap.get(columnName);

        if(valList == null) throw new RuntimeException("未读取到列名为" + columnName + "的任何数据!");
        Random random = new Random();

        int randomIndex = random.nextInt(valList.size());

        return valList.get(randomIndex);
    }

    /**
     * 将Excel文件读取为Map结构: <column_name, list>
     * 其中column_name为第4行的名字
     * @param path Excel文件路径(要求Excel为2007)
     * @param sheetName 工作簿名称
     * @return Map结构
     */
    public static Map<String, List<String>> readXlsx(String path, String sheetName)
    {
        // 列的数量
        int columnNum = 0;
        HashMap<String, List<String>> resultMap = new HashMap<String, List<String>>();
        ArrayList<String> columnList = new ArrayList<String>();

        try
        {
            OPCPackage pkg= OPCPackage.open(path);
            XSSFWorkbook excel=new XSSFWorkbook(pkg);
            //获取sheet
            XSSFSheet sheet=excel.getSheet(sheetName);

            // 加载列名
            XSSFRow columnRow = sheet.getRow(3);
            if(columnRow == null) {
                throw new RuntimeException("数据文件读取错误!请确保第4行为英文列名!");
            }
            else {
                Iterator<Cell> colIter = columnRow.iterator();
                // 迭代所有列
                while(colIter.hasNext()) {
                    Cell cell = colIter.next();
                    String colName = cell.getStringCellValue();
                    columnList.add(colName);
                    columnNum++;
                }
            }

            System.out.println("读取到:" + columnNum + "列");
            System.out.println(Arrays.toString(columnList.toArray()));

            // 初始化resultMap
            for(String colName : columnList) {
                resultMap.put(colName, new ArrayList<String>());
            }

            // 迭代sheet
            Iterator<Row> iter = sheet.iterator();
            int i = 0;
            int rownum = 1;

            while(iter.hasNext()) {
                Row row = iter.next();
                Iterator<Cell> cellIter = row.cellIterator();

                // 跳过前4行
                if(rownum <= 4) {
                    ++rownum;
                    continue;
                }

                while(cellIter.hasNext()) {
                    XSSFCell cell=(XSSFCell) cellIter.next();
                    //根据单元的的类型,读取相应的结果
                    if(cell.getCellType() == CellType.NUMERIC) {
                        resultMap.get(columnList.get(i % columnList.size())).add(Double.toString(cell.getNumericCellValue()));
                    }
                    else {
                        resultMap.get(columnList.get(i % columnList.size())).add(cell.getStringCellValue());
                    }

                    ++i;
                    ++rownum;
                }
            }
        }
        catch (Exception e) {
            e.printStackTrace();
        }

        return resultMap;
    }
}

3.6、创建实体类

在cn.kefu.momo_chat.entity包中创建一个名为Msg的实体类,使用Java代码描述陌陌消息。

public class Msg {
    private String msg_time;
    private String sender_nickyname;
    private String sender_account;
    private String sender_sex;
    private String sender_ip;
    private String sender_os;
    private String sender_phone_type;
    private String sender_network;
    private String sender_gps;
    private String receiver_nickyname;
    private String receiver_ip;
    private String receiver_account;
    private String receiver_os;
    private String receiver_phone_type;
    private String receiver_network;
    private String receiver_gps;
    private String receiver_sex;
    private String msg_type;
    private String distance;
    private String message;

4、编写数据生成器

在cn.kefu.momo_chat.tool包中创建一个名为MoMoMsgGen类

4.1、随机生成一条数据

编写getOneMsg方法,调用ExcelReader工具类,随机生成一条陌陌消息数据

实现步骤:

  1. 构建Msg实体类对象

  2. 调用ExcelReader中的randomColumn随机生成一个列的数据

  3. 注意时间使用系统当前时间

private static Msg getOneMsg(Map<String, List<String>> resultMap) {
    Msg msg = new Msg();
    long timestamp = new Date().getTime();
    SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    msg.setMsg_time(sdf.format(timestamp));
    msg.setSender_nickyname(ExcelReader.randomColumn(resultMap, "sender_nickyname"));
    msg.setSender_account(ExcelReader.randomColumn(resultMap, "sender_account"));
    msg.setSender_sex(ExcelReader.randomColumn(resultMap, "sender_sex"));
    msg.setSender_ip(ExcelReader.randomColumn(resultMap, "sender_ip"));
    msg.setSender_os(ExcelReader.randomColumn(resultMap, "sender_os"));
    msg.setSender_phone_type(ExcelReader.randomColumn(resultMap, "sender_phone_type"));
    msg.setSender_network(ExcelReader.randomColumn(resultMap, "sender_network"));
    msg.setSender_gps(ExcelReader.randomColumn(resultMap, "sender_gps"));
    msg.setReceiver_nickyname(ExcelReader.randomColumn(resultMap, "receiver_nickyname"));
    msg.setReceiver_ip(ExcelReader.randomColumn(resultMap, "receiver_ip"));
    msg.setReceiver_account(ExcelReader.randomColumn(resultMap, "receiver_account"));
    msg.setReceiver_os(ExcelReader.randomColumn(resultMap, "receiver_os"));
    msg.setReceiver_phone_type(ExcelReader.randomColumn(resultMap, "receiver_phone_type"));
    msg.setReceiver_network(ExcelReader.randomColumn(resultMap, "receiver_network"));
    msg.setReceiver_gps(ExcelReader.randomColumn(resultMap, "receiver_gps"));
    msg.setReceiver_sex(ExcelReader.randomColumn(resultMap, "receiver_sex"));
    msg.setMsg_type(ExcelReader.randomColumn(resultMap, "msg_type"));
    msg.setDistance(ExcelReader.randomColumn(resultMap, "distance"));
    msg.setMessage(ExcelReader.randomColumn(resultMap, "message"));

    return msg;
}

4.2、构建ROWKEY

ROWKEY = MD5Hash_发件人账号_收件人账号_消息时间戳

  1. 其中MD5Hash的计算方式为:发送人账号 + “” + 收件人账号 + “” + 消息时间戳

  2. 使用MD5Hash.getMD5AsHex方法生成MD5值

  3. 取MD5值的前8位,避免过长

  4. 最后把发件人账号、收件人账号、消息时间戳和MD5拼接起来

private static byte[] getRowkey(Msg msg) throws ParseException {
    // 3. 构建ROWKEY
    // 发件人ID1反转
    StringBuilder stringBuilder = new StringBuilder(msg.getSender_account());
    stringBuilder.append("_");
    stringBuilder.append(msg.getReceiver_account());
    stringBuilder.append("_");
    // 转换为时间戳
    SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    stringBuilder.append(sdf.parse(msg.getMsg_time()).getTime());

    byte[] orginkey = Bytes.toBytes(stringBuilder.toString());
    // 为了避免ROWKEY过长,取前八位
    String md5AsHex = MD5Hash.getMD5AsHex(orginkey).substring(0, 8);

    return Bytes.toBytes(md5AsHex + "_" + stringBuilder.toString());
}

4.3、 数据写入HBase,生成10W条数据

实现步骤:

  1. 获取Hbase连接

  2. 获取HBase表MOMO_CHAT:MSG

  3. 初始化操作Hbase所需的变量(列蔟、列名)

  4. 构建put请求

  5. 挨个添加陌陌消息的所有列

  6. 发起put请求

/**
 * 陌陌数据生成器
 */
public class MoMoMsgGen {

    public static void main(String[] args) throws Exception {
        Map<String, List<String>> resultMap = ExcelReader.readXlsx("D:\\课程研发\\39.HBaseV8.0\\4.资料\\数据集\\测试数据集.xlsx", "陌陌数据");

        // 1. 获取HBase连接
        Configuration configuration = HBaseConfiguration.create();

        Connection connection = ConnectionFactory.createConnection(configuration);

        // 2. 获取HTable
        String TABLE_NAME = "MOMO_CHAT:MSG";
        Table momoChatlTable = connection.getTable(TableName.valueOf(TABLE_NAME));

        String cf_name = "C1";
        String col_msg_time = "msg_time";
        String col_sender_nickyname = "sender_nickyname";
        String col_sender_account = "sender_account";
        String col_sender_sex = "sender_sex";
        String col_sender_ip = "sender_ip";
        String col_sender_os = "sender_os";
        String col_sender_phone_type = "sender_phone_type";
        String col_sender_network = "sender_network";
        String col_sender_gps = "sender_gps";
        String col_receiver_nickyname = "receiver_nickyname";
        String col_receiver_ip = "receiver_ip";
        String col_receiver_account = "receiver_account";
        String col_receiver_os = "receiver_os";
        String col_receiver_phone_type = "receiver_phone_type";
        String col_receiver_network = "receiver_network";
        String col_receiver_gps = "receiver_gps";
        String col_receiver_sex = "receiver_sex";
        String col_msg_type = "msg_type";
        String col_distance = "distance";
        String col_message = "message";

        int i = 0;
        int max = 100000;
        while(i < max) {
            Msg msg = getOneMsg(resultMap);
            Put put = new Put(getRowkey(msg));

            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_msg_time), Bytes.toBytes(msg.getMsg_time()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_nickyname), Bytes.toBytes(msg.getSender_nickyname()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_account), Bytes.toBytes(msg.getSender_account()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_sex), Bytes.toBytes(msg.getSender_sex()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_ip), Bytes.toBytes(msg.getSender_ip()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_os), Bytes.toBytes(msg.getSender_os()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_phone_type), Bytes.toBytes(msg.getSender_phone_type()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_network), Bytes.toBytes(msg.getSender_network()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_sender_gps), Bytes.toBytes(msg.getSender_gps()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_nickyname), Bytes.toBytes(msg.getReceiver_nickyname()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_ip), Bytes.toBytes(msg.getReceiver_ip()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_account), Bytes.toBytes(msg.getReceiver_account()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_os), Bytes.toBytes(msg.getReceiver_os()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_phone_type), Bytes.toBytes(msg.getReceiver_phone_type()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_network), Bytes.toBytes(msg.getReceiver_network()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_gps), Bytes.toBytes(msg.getReceiver_gps()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_receiver_sex), Bytes.toBytes(msg.getReceiver_sex()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_msg_type), Bytes.toBytes(msg.getMsg_type()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_distance), Bytes.toBytes(msg.getDistance()));
            put.addColumn(Bytes.toBytes(cf_name), Bytes.toBytes(col_message), Bytes.toBytes(msg.getMessage()));

            // 5. 执行put请求
            momoChatlTable.put(put);
            System.out.println(i + " / " + max);
            ++i;
        }

        // 6. 关闭连接
        momoChatlTable.close();
        momoChatlTable.close();

    }

    private static byte[] getRowkey(Msg msg) throws ParseException {
        // 3. 构建ROWKEY
        // 发件人ID1反转
        StringBuilder stringBuilder = new StringBuilder(msg.getSender_account());
        stringBuilder.append("_");
        stringBuilder.append(msg.getReceiver_account());
        stringBuilder.append("_");
        // 转换为时间戳
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

        stringBuilder.append(sdf.parse(msg.getMsg_time()).getTime());

        byte[] orginkey = Bytes.toBytes(stringBuilder.toString());
        // 为了避免ROWKEY过长,取前八位
        String md5AsHex = MD5Hash.getMD5AsHex(orginkey).substring(0, 8);

        return Bytes.toBytes(md5AsHex + "_" + stringBuilder.toString());
    }

    private static Msg getOneMsg(Map<String, List<String>> resultMap) {
        Msg msg = new Msg();
        long timestamp = new Date().getTime();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

        msg.setMsg_time(sdf.format(timestamp));
        msg.setSender_nickyname(ExcelReader.randomColumn(resultMap, "sender_nickyname"));
        msg.setSender_account(ExcelReader.randomColumn(resultMap, "sender_account"));
        msg.setSender_sex(ExcelReader.randomColumn(resultMap, "sender_sex"));
        msg.setSender_ip(ExcelReader.randomColumn(resultMap, "sender_ip"));
        msg.setSender_os(ExcelReader.randomColumn(resultMap, "sender_os"));
        msg.setSender_phone_type(ExcelReader.randomColumn(resultMap, "sender_phone_type"));
        msg.setSender_network(ExcelReader.randomColumn(resultMap, "sender_network"));
        msg.setSender_gps(ExcelReader.randomColumn(resultMap, "sender_gps"));
        msg.setReceiver_nickyname(ExcelReader.randomColumn(resultMap, "receiver_nickyname"));
        msg.setReceiver_ip(ExcelReader.randomColumn(resultMap, "receiver_ip"));
        msg.setReceiver_account(ExcelReader.randomColumn(resultMap, "receiver_account"));
        msg.setReceiver_os(ExcelReader.randomColumn(resultMap, "receiver_os"));
        msg.setReceiver_phone_type(ExcelReader.randomColumn(resultMap, "receiver_phone_type"));
        msg.setReceiver_network(ExcelReader.randomColumn(resultMap, "receiver_network"));
        msg.setReceiver_gps(ExcelReader.randomColumn(resultMap, "receiver_gps"));
        msg.setReceiver_sex(ExcelReader.randomColumn(resultMap, "receiver_sex"));
        msg.setMsg_type(ExcelReader.randomColumn(resultMap, "msg_type"));
        msg.setDistance(ExcelReader.randomColumn(resultMap, "distance"));
        msg.setMessage(ExcelReader.randomColumn(resultMap, "message"));

        return msg;
    }
}

5、编写数据服务查询数据

5.1、 需求

数据存储到HBase之后,用户可能会在某一时间按照日期来查询聊天记录。例如:用户点击某一个日期,那就需要将当天用户和另外一个用户的打招呼聊天记录查询出来。也就是需要按照以下几个字段来进行查询。

  • 日期

  • 发件人

  • 收件人

5.2、 接口

ChatMessageService接口

  • getMessage表示从Hbase中的消息记录根据日期、发送者账号、接受者账号查询数据,并返回一个List集合

  • close表示关闭打开的相关资源

/**
 * 陌陌消息服务
 */
public interface ChatMessageService {
    List<Msg> getMessage(String date, String sender, String receiver) throws Exception;
    void close();
}

5.3、实现类

要根据日期、发件人、收件人查询消息,我们需要使用scan+filter来进行扫描。我们需要使用多个Filter组合起来进行查询。

实现步骤:

  1. 构建scan对象

  2. 构建用于查询时间的范围,例如:2020-10-05 00:00:00 – 2020-10-05 23:59:59

  3. 构建查询日期的两个Filter,大于等于、小于等于,此处过滤单个列使用SingleColumnValueFilter即可。

  4. 构建发件人Filter

  5. 构建收件人Filter

  6. 使用FilterList组合所有Filter

  7. 设置scan对象filter

  8. 获取HTable对象,并调用getScanner执行

  9. 获取迭代器,迭代每一行,同时迭代每一个单元格

HbaseNativeChatMessageService实现类

/**
 * 使用HBase原生API实现数据服务
 */
public class HBaseNativeChatMessageService implements ChatMessageService {
    private Connection connection;
    private SimpleDateFormat sdf;
    private ExecutorService executorServiceMsg;
    private Table tableMsg;

    public HBaseNativeChatMessageService() {
        try {
            Configuration cfg = HBaseConfiguration.create();
            connection = ConnectionFactory.createConnection(cfg);
            sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

            executorServiceMsg = Executors.newFixedThreadPool(5);
            tableMsg = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"), executorServiceMsg);
        } catch (IOException e) {
            System.out.println("**获取HBase连接失败**");
            throw new RuntimeException(e);
        }
    }

    /**
     * 根据日期、发件人、收件人查询消息
     * @param date
     * @param sender
     * @param receiver
     * @return
     */
    @Override
    public List<Msg> getMessage(String date, String sender, String receiver) throws Exception {
        if(connection == null) throw new RuntimeException("未初始化HBase连接!");

        // 构建scan对象
        Scan scan = new Scan();
        // 构建Start ROWKEY
        String startDate = date + " 00:00:00";
        String endDate = date + " 23:59:59";

        // 构建Filter
        // 构建日期查询
        SingleColumnValueFilter startDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
                , Bytes.toBytes("msg_time")
                , CompareOperator.GREATER_OR_EQUAL
                , new BinaryComparator(Bytes.toBytes(startDate + "")));

        SingleColumnValueFilter endDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
                , Bytes.toBytes("msg_time")
                , CompareOperator.LESS_OR_EQUAL
                , new BinaryComparator(Bytes.toBytes(endDate + "")));

        SingleColumnValueFilter senderFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
                , Bytes.toBytes("sender_account")
                , CompareOperator.EQUAL
                , new BinaryComparator(Bytes.toBytes(sender)));

        SingleColumnValueFilter receiverFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
                , Bytes.toBytes("receiver_account")
                , CompareOperator.EQUAL
                , new BinaryComparator(Bytes.toBytes(receiver)));


        Filter filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL
                , startDateFilter
                , endDateFilter
                , senderFilter
                , receiverFilter);
        scan.setFilter(filterList);
        ResultScanner scanner = tableMsg.getScanner(scan);
        Iterator<Result> iter = scanner.iterator();
        List<Msg> msgList = new ArrayList<>();

        while(iter.hasNext()) {
            Result result = iter.next();
            Msg msg = new Msg();
            // 遍历所有列
            while(result.advance()) {
                Cell cell = result.current();
                String columnName = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());

                if(columnName.equalsIgnoreCase("msg_time")){
                    msg.setMsg_time(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_nickyname")){
                    msg.setSender_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_account")){
                    msg.setSender_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_sex")){
                    msg.setSender_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_ip")){
                    msg.setSender_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_os")){
                    msg.setSender_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_phone_type")){
                    msg.setSender_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_network")){
                    msg.setSender_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("sender_gps")){
                    msg.setSender_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_nickyname")){
                    msg.setReceiver_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_ip")){
                    msg.setReceiver_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_account")){
                    msg.setReceiver_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_os")){
                    msg.setReceiver_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_phone_type")){
                    msg.setReceiver_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_network")){
                    msg.setReceiver_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_gps")){
                    msg.setReceiver_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("receiver_sex")){
                    msg.setReceiver_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("msg_type")){
                    msg.setMsg_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("distance")){
                    msg.setDistance(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }
                if(columnName.equalsIgnoreCase("message")){
                    msg.setMessage(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
                }

                msgList.add(msg);
            }
        }

        return msgList;
    }

    public void close() {
        try {
            connection.close();
            tableMsg.close();
            executorServiceMsg.shutdown();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

5.3、测试

public static void main(String[] args) throws Exception{
        HBaseNativeChatMessageService hbaseNativeChatMessageService = new HBaseNativeChatMessageService();
        List<Msg> message = hbaseNativeChatMessageService.getMessage("2020-08-24", "13504113666", "18182767005");

        for (Msg msg : message) {
            System.out.println(msg);
        }
        hbaseNativeChatMessageService.close();
    }

6、性能问题

  • Hbase默认只支持对行键的索引,那么如果要针对其它的列来进行查询,就只能全表扫描
  • 上述的查询是使用scan + filter组合来进行查询的,但查询地效率不高,因为要进行顺序全表扫描而没有其他索引。如果数据量较大,只能在客户端(client)来进行处理,如果要传输到Client大量的数据,然后交由客户端处理
  • 网络传输压力很大
  • 客户端的压力很大
  • 如果表存储的数据量很大时,效率会非常低下,此时需要使用二级索引
  • 也就是除了ROWKEY的索引外,还需要人为添加其他的方便查询的索引
    如果每次需要我们开发二级索引来查询数据,这样使用起来很麻烦。再者,查询数据都是HBase Java API,使用起来不是很方便。为了让其他开发人员更容易使用该接口。如果有一种SQL引擎,通过SQL语句来查询数据会更加方便。

针对以上的一些性能问题,我们可以使用Apache Phoenix,我的下一篇博客会具体的讲解Phoenix的安装和使用,已经如何对该案例精选优化。

  • 0
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值