17、es整合Hbase实现二级索引
需求:解决海量数据的存储,并且能够实现海量数据的秒级查询.
实际生产中,一遍文章要分成标题和正文;但是正文的量是比较大的,那么我们一般会在es中存储标题,在hbase 中存储正文(hbase本身就是做海量数据的存储);这样通过es的倒排索引列表检索到关键词的文档id,然后根据文档id在hbase中查询出具体的正文。
1 、存储设计
分析,数据哪些字段需要构建索引: 文章数据(id、title、author、describe、conent)
字段名称 是否需要索引 是否需要存储
Id 默认索引 默认存储
Title 需要 需要
Author 看需求 看需求
Dscribe 需要 存储
Content 看需求(高精度查询,是需要的 ) 看需求
Time 需要 需要
2 、索引库设计
PUT /articles
{
"settings":{
"number_of_shards":3,
"number_of_replicas":1,
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik_max_word"
}
}
}
},
"mappings":{
"article":{
"dynamic":"strict",
"_source": {
"includes": [
"id","title","from","readCounts","times"
],
"excludes": [
"content"
]
},
"properties":{
"id":{
"type": "keyword", "store": true},
"title":{
"type": "text","store": true,"index" : true,"analyzer": "ik_max_word"},
"from":{
"type": "keyword","store": true},
"readCounts":{
"type": "integer","store": true},
"content":{
"type": "text","store": false,"index": false},
"times": {
"type": "keyword", "index": false}
}
}
}
}
3、导入jar包
创建maven工程并导入jar包
<dependencies>
<!--解析excel文件-->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>3.8</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.8</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.8</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>6.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.9.1</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</