ES--二级索引

最新推荐文章于 2023-01-11 10:00:58 发布

韩家小志

最新推荐文章于 2023-01-11 10:00:58 发布

阅读量1.7k

点赞数

文章标签： elasticsearch 二级索引

本文链接：https://blog.csdn.net/qq_46893497/article/details/112420211

版权

基础专栏收录该内容

148 篇文章 0 订阅

订阅专栏

二级索引

1、应用场景
2、需求分析
- 实现
- 流程
3、代码实现
二级索引Maven依赖

1、应用场景

`ES优缺点`

优点：可以构建全文索引，根据需求可以将任意的数据构建索引来查询
缺点：数据量大，性能不能满足高实时要求，本身数据安全的隐患相对较高

`Hbase优缺点`

优点：实现大量数据集高性能的实时读写，数据相对安全
缺点：rowkey作为唯一索引，复杂业务中，查询条件肯定是变化多样的
- 如果查询条件不是rowkey的前缀
- 无法走索引，只能构建二级索引
为什么不用Hbase中构建二级索引表？
- 只要有一个需求，就需要构建一个Hbase二级索引表
  - 原表：rowkey：id
  - 索引表：rowkey：name
    - id
  - 索引表：rowkey：age
    - id

ES构建索引表

将所有的条件，在ES中存储构建索引，指向Hbase中rowkey
ES中数据

documentId	age	name		rowkey:id
0	18	zhangsan	male	001

Hbase中数据

rowkey:id	age	name	sex
001	18	zhangsan	male
002	19	lisi	female

避免了，在Hbase中需要构建多张二级索引表

2、需求分析

需求：通过检索标题中的关键字、来源、时间、阅读次数等条件查询，获取文章正文内容
数据

ID
标题
来源
时间
阅读次数
正文

能不能存储在ES中？
- 可以的
- 有问题存在
  - 性能或者安全性，没有必要将所有数据存储在ES中
  - 如果存储了大量数据，都构建索引，性能会差

实现

将数据完整的存放在Hbase中
- id作为Rowkey
- 标题、来源、时间、阅读次数、正文
将需要用到的查询条件存储在ES中，将所有条件构建索引
- ID
- 标题
- 来源
- 时间
- 阅读次数
用户查询
- step1：在ES中根据查询条件，查询符合条件的数据的ID
step2：再通过ID到Hbase中获取这篇文章的正文

流程

step1：读取Excel文件，解析每条数据，封装到javaBean中
每一条数据就是一个JavaBean对象
将所有的JavaBean放在一个集合中
step2：将JavaBean数据写入Hbase和ES
ES：id、标题、来源、时间、阅读次数
Hbase：所有的字段都存储在Hbase
rowkey：id
step3：基于标题来查询相关数据的正文内容
- 先查询ES，获取这个标题相关的ID
- 根据ID到Hbase中进行查询，返回正文

3、代码实现

ES中创建对应的索引库

PUT /articles
{  
    "settings":{  
         "number_of_shards":3,  
         "number_of_replicas":1,
         "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik_max_word"
                }
            }
        }
    }, 
    "mappings":{  
         "article":{  
             "dynamic":"strict",
             "_source": {
               "includes": [
                  "id","title","from","readCount","time"
                ],
               "excludes": [
                  "content"
               ]
             },
             "properties":{  
                 "id":{"type": "keyword", "store": true},  
                 "title":{"type": "text","store": true,"index" : true,"analyzer": "ik_max_word"}, 
                 "from":{"type": "keyword","store": true}, 
                 "readCount":{"type": "integer","store": true},  
                 "content":{"type": "text","store": false,"index": false},
                 "time": {"type": "keyword", "index": false}
             }  
         }  
    }  
}

Hbase中创建对应的表
- 需要使用root用户来操作
- 启动HDFS
  - start-dfs.sh
- 启动Zookeeper
  - /export/servers/zookeeper-3.4.5-cdh5.14.0/bin/start-zk-all.sh
- 启动Hbase
  - start-hbase.sh
- 创建表
  - create ‘articles’,‘article’
构建Excel文件解析类

package cn.hanjiaxiaozhi.util;

import cn.hanjiaxiaozhi.bean.EsArticle;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * @ClassName ExcelUtil
 * @Description TODO 用于解析Excel中的数据，封装成JavaBean工具类
 * @Date 2020/7/5 15:46
 * @Create By     Frank
 */
public class ExcelUtil {

    /**
     * 用于解析Excel文件的数据
     * @param path
     * @return
     */
    public static List<EsArticle> parseExcelData(String path) throws IOException {
        //构建一个返回值
        List<EsArticle> lists = new ArrayList<>();
        //将文件构建输入流
        FileInputStream inputStream = new FileInputStream(path);
        //解析Excel文件，得到每张表
        XSSFWorkbook sheets = new XSSFWorkbook(inputStream);
        //获取对应的表格
        XSSFSheet sheet = sheets.getSheetAt(0);
        //先获取这张表格总行数，然后迭代取每一行
        int lastRowNum = sheet.getLastRowNum();
        //列的名称，不要，从1开始
        for(int i = 1;i <= lastRowNum ;i++){
            //拿到每一行的内容
            XSSFRow row = sheet.getRow(i);
            //取出每一行的每一列
            String id = row.getCell(0).toString();//id
            String title = row.getCell(1).toString();//标题
            String from = row.getCell(2).toString();//来源
            String time = row.getCell(3).toString();//时间
            String readCount = row.getCell(4).toString();//阅读次数
            String content = row.getCell(5).toString();//正文内容
            //将每一行封装成JavaBean对象
            EsArticle esArticle = new EsArticle(id, title, from, time, readCount, content);
            //放入集合
            lists.add(esArticle);
        }

        //返回
        return lists;
    }
}

构建Hbase读写工具类

package cn.hanjiaxiaozhi.util;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import sun.security.pkcs11.P11Util;

import java.io.IOException;

/**
 * @ClassName HbaseUtil
 * @Description TODO 用于读写Hbase
 * @Date 2020/7/5 16:13
 * @Create By     Frank
 */
public class HbaseUtil {


    private static Table getHbaseTable(String tableName) throws IOException {
        //获取一个Hbase连接
        Configuration conf = HBaseConfiguration.create();
        //指定zookeeper的地址
        conf.set("hbase.zookeeper.quorum","node-01:2181,node-02:2181,node-03:2181");
        Connection conn = ConnectionFactory.createConnection(conf);
        //构建表的对象
        Table table = conn.getTable(TableName.valueOf(tableName));
        return table;
    }

    /**
     *
     * 将数据写入Hbase
     * @param tableName
     * @param rowkey
     * @param family
     * @param column
     * @param value
     */
    public static void writeToHbase(String tableName,String rowkey,String family,String column,String value) throws IOException {
        //构建Hbase表的对象
        Table table = getHbaseTable(tableName);
        //构建Put对象
        Put put = new Put(Bytes.toBytes(rowkey));
        //配置
        put.addColumn(Bytes.toBytes(family),Bytes.toBytes(column),Bytes.toBytes(value));
        //执行
        table.put(put);
    }

    /**
     * 通过rowkey返回正文的内容
     * @param tableName
     * @param rowkey
     * @param family
     * @param column
     * @return
     * @throws IOException
     */
    public static String readFromHbase(String tableName,String  rowkey,String family,String column) throws IOException {
        //获取表的对象
        Table table = getHbaseTable(tableName);
        //获取某个rowkey的数据
        Get get = new Get(Bytes.toBytes(rowkey));
        //执行获取这个rowkey所有的数据
        Result result = table.get(get);
        //返回content这一列的数据
        byte[] content = result.getValue(Bytes.toBytes(family), Bytes.toBytes(column));
        return  Bytes.toString(content);
    }

}

构建ES读写工具类

package cn.hanjiaxiaozhi.util;

import cn.hanjiaxiaozhi.bean.EsArticle;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.List;

/**
 * @ClassName EsUtil
 * @Description TODO 读写ES
 * @Date 2020/7/5 16:13
 * @Create By     Frank
 */
public class EsUtil {

    static String indexName = "articles";
    static  String typeName = "article";


    public  static TransportClient getESClient() throws UnknownHostException {
        Settings settings = Settings.builder().put("cluster.name","myes").build();
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new TransportAddress(InetAddress.getByName("node-01"),9300))
                .addTransportAddress(new TransportAddress(InetAddress.getByName("node-02"),9300))
                .addTransportAddress(new TransportAddress(InetAddress.getByName("node-03"),9300));

        return client;
    }

    /**
     * 将数据写入ES
     * @param esArticles
     * @throws UnknownHostException
     */
    public static void writeToES(List<EsArticle> esArticles) throws UnknownHostException {
        //获取一个ES的客户端
        TransportClient esClient = getESClient();
        //将集合中的每条数据写入ES
        BulkRequestBuilder bulk = esClient.prepareBulk();
        //迭代
        for (EsArticle esArticle : esArticles) {
            //将每个JavaBean对象变成JsonString
            String jsonString = JSON.toJSONString(esArticle);
            //构建写入请求
            IndexRequestBuilder requestBuilder = esClient.prepareIndex(indexName, typeName, esArticle.getId()).setSource(jsonString, XContentType.JSON);
            //放入bulk
            bulk.add(requestBuilder);
        }
        //执行bulk
        bulk.get();
    }

    /**
     * 通过搜索的关键词对标题进行匹配，将符合的数据用java Bean返回
     * @param keyword
     * @return
     */
    public static List<EsArticle> readFromEs(String keyword) throws UnknownHostException {
        //构建一个返回值
        List<EsArticle> lists = new ArrayList<>();
        //获取客户端
        TransportClient esClient = getESClient();
        //构建一个查询器
        SearchResponse title = esClient.prepareSearch(indexName)
                .setTypes(typeName)
                .setQuery(QueryBuilders.termQuery("title", keyword))
                .get();
        //得到 符合条件的数据
        SearchHit[] hits = title.getHits().getHits();
        for (SearchHit hit : hits) {
            //获取对应的JSON字符串
            String sourceAsString = hit.getSourceAsString();
            //转为JavaBean
            EsArticle esArticle = JSON.parseObject(sourceAsString, EsArticle.class);
            //放入list集合
            lists.add(esArticle);
        }
        //返回
        return lists;
    }
}

构建主程序

package cn.hanjiaxiaozhi.app;

import cn.hanjiaxiaozhi.bean.EsArticle;
import cn.hanjiaxiaozhi.util.EsUtil;
import cn.hanjiaxiaozhi.util.ExcelUtil;
import cn.hanjiaxiaozhi.util.HbaseUtil;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.UnknownHostException;
import java.util.List;

/**
 * @ClassName TestEsAndHbase
 * @Description TODO 用于实现将Excel中的数据写入Hbase和ES，通过ES构建二级索引查询
 *      从ES中根据索引得到Id
 *      根据id到Hbase查询正文
 * @Date 2020/7/5 15:40
 * @Create By     Frank
 */
public class TestEsAndHbase {

    static String tableName = "articles";
    static String family = "article";

    public static void main(String[] args) throws IOException {
        //todo:1-读取Excel的数据，将每条数据封装成JavaBean
        //定义文件的路径
        String path = "datas/excel/hbaseEs.xlsx";
        //将文件中的每一行变成一个javaBean返回
        List<EsArticle> esArticles = ExcelUtil.parseExcelData(path);
//        System.out.println(esArticles);
        //todo:2-将数据写入Hbase和ES
//        writeData(esArticles);
        //todo:3-实现按照标题查询
        search("意大利");
    }

    /**
     * 用于根据标题中的关键字，返回正文的内容
     * @param keyword
     */
    private static void search(String keyword) throws IOException {
        //根据标题从ES中通过索引匹配返回document
        List<EsArticle> esArticles = EsUtil.readFromEs(keyword);
        //根据返回的ID到Hbase中查询正文
        for (EsArticle esArticle : esArticles) {
            //获取符合条件的数据的id
            String id = esArticle.getId();
            //根据id 到hbase中查询
            String content = HbaseUtil.readFromHbase(tableName, id, family, "content");
            //打印结果
            System.out.println(content);
        }
    }

    private static void writeData(List<EsArticle> esArticles) throws IOException {
        //写入ES
        EsUtil.writeToES(esArticles);
        //写入Hbase
        writeDataToHbase(esArticles);
    }

    private static void writeDataToHbase(List<EsArticle> esArticles) throws IOException {
        //取每一条数据，写入Hbase
        for (EsArticle esArticle : esArticles) {
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"title",esArticle.getTitle());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"from",esArticle.getFrom());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"time",esArticle.getTime());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"readCount",esArticle.getReadCount());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"content",esArticle.getContent());
        }
    }
}

二级索引Maven依赖

 <!-- 指定仓库位置，依次为aliyun、cloudera和jboss仓库 -->
    <repositories>
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        </repository>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
        <repository>
            <id>jboss</id>
            <url>http://repository.jboss.com/nexus/content/groups/public</url>
        </repository>
    </repositories>
    <dependencies>
        <!--es客户端-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.0.0</version>
        </dependency>
        <!--日志记录器-->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.9.1</version>
        </dependency>
        <!--用于解析JSON的工具类-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.47</version>
        </dependency>
        <!--单元测试-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!--用于解析Excel表格的工具类-->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-schemas</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>3.8</version>
        </dependency>
        <!--Hbase的依赖-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.2.0-cdh5.14.0</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

韩家小志

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
ES--二级索引

二级索引1、应用场景`ES优缺点``Hbase优缺点`ES构建索引表2、需求分析实现流程3、代码实现二级索引Maven依赖1、应用场景ES优缺点优点：可以构建全文索引，根据需求可以将任意的数据构建索引来查询缺点：数据量大，性能不能满足高实时要求，本身数据安全的隐患相对较高Hbase优缺点优点：实现大量数据集高性能的实时读写，数据相对安全缺点：rowkey作为唯一索引，复杂业务中，查询条件肯定是变化多样的如果查询条件不是rowkey的前缀无法走索引，只能构建二级索引为什么不用H
复制链接

扫一扫

专栏目录