原创:郑佳伟
最近在做搜索推荐的东西,所以整理一些相关的内容和大家分享。
在搜索中有个很重要的工具叫ElasticSearch简称ES,这个工具主要用来做倒排索引,也就是根据想要搜索的内容在ES中进行查找,返回对应的文章,至于为什么不在数据库中查找,因为在数据库中查找速度太慢,具体原因可以参考 《为什么需要 Elasticsearch》 https://zhuanlan.zhihu.com/p/73585202。
在正常的应用中,数据是存放在数据库中的,而在ES中查找,则需要将数据库中的内容导入的ES中,文章则介绍一种工具Monstache,并且会将Monstache的安装和使用方法,docker和docker-compose分享出来,让大家直接使用。
搜索内容入库步骤
- Mongo中的数据同步到ElasticSearch(简称ES)
- 在ES中重新建立索引
前期准备
- 创建Mongo副本集 docker 部署 MongoDB 副本集
- ES环境搭建+分词器安装
方法:获取镜像(包含ES7.5和hanlp分词器)
注意: ES版本和hanlp版本需对应
tomczhen/elasticsearch-with-hanlp:7.5.0 #镜像名称
Mongo中的数据同步到ElasticSearch
简单方法
这个过程比较繁琐,为了照顾没有耐心看完文章的小伙伴,所以在文章开始,直接给出docker-compose:
安装docker-compose可以参考lianglin:docker-compose的安装,使用的话可以参考纯洁的微笑:Docker(四):Docker 三剑客之 Docker Compose
version: '3.7'
services:
monstache:
#镜像来源
image: zhengjiawei001/monstache
container_name: zjw_monstache
restart: always
command: bash -c "source /etc/profile && cd /usr/local/monstache && monstache --mongo-url mongodb://10.30.89.124:27011 --elasticsearch-url http://10.30.89.124:9200 -f config.toml"
networks:
default:
external:
name: serving-database_default
注1:可以通过命令行的方法在 config.toml中添加参数,例如 --mongo-url mongodb://10.30.89.124:27011 添加mongo的地址,–elasticsearch-url http://10.30.89.124:9200 添加ES的地址。
所有的参数说明:
monstache_1 | Usage of monstache:
monstache_1 | -change-stream-namespace value
monstache_1 | A list of change stream namespaces
monstache_1 | -cluster-name string
monstache_1 | Name of the monstache process cluster
monstache_1 | -config-database-name string
monstache_1 | The MongoDB database name that monstache uses to store metadata
monstache_1 | -debug
monstache_1 | True to enable verbose debug information
monstache_1 | -delete-index-pattern string
monstache_1 | An Elasticsearch index-pattern to restric the scope of stateless deletes
monstache_1 | -delete-strategy value
monstache_1 | Stategy to use for deletes. 0=stateless,1=stateful,2=ignore
monstache_1 | -direct-read-bounded
monstache_1 | True to limit direct reads to the docs present at query start time
monstache_1 | -direct-read-concur int
monstache_1 | Max number of direct-read-namespaces to read concurrently. By default all givne are read concurrently
monstache_1 | -direct-read-dynamic-exclude-regex string
monstache_1 | A regex to use for excluding namespaces when using dynamic direct reads
monstache_1 | -direct-read-dynamic-include-regex string
monstache_1 | A regex to use for including namespaces when using dynamic direct reads
monstache_1 | -direct-read-namespace value
monstache_1 | A list of direct read namespaces
monstache_1 | -direct-read-no-timeout
monstache_1 | True to set the no cursor timeout flag for direct reads
monstache_1 | -direct-read-split-max int
monstache_1 | Max number of times to split a collection for direct reads
monstache_1 | -direct-read-stateful
monstache_1 | True to mark direct read namespaces as complete and not sync them in future runs
monstache_1 | -disable-change-events
monstache_1 | True to disable listening for changes. You must provide direct-reads in this case
monstache_1 | -disable-delete-protection
monstache_1 | True to disable delete protection and allow multiple deletes in Elasticsearch per event in MongoDB
monstache_1 | -disable-file-pipeline-put
monstache_1 | True to disable auto-creation of the ingest plugin pipeline
monstache_1 | -dropped-collections
monstache_1 | True to delete indexes from dropped collections (default true)
monstache_1 | -dropped-databases
monstache_1 | True to delete indexes from dropped databases (default true)
monstache_1 | -elasticsearch-client-timeout int
monstache_1 | Number of seconds before a request to Elasticsearch is timed out
monstache_1 | -elasticsearch-max-bytes int
monstache_1 | Number of bytes to hold before flushing to Elasticsearch
monstache_1 | -elasticsearch-max-conns int
monstache_1 | Elasticsearch max connections
monstache_1 | -elasticsearch-max-docs int
monstache_1 | Number of docs to hold before flushing to Elasticsearch
monstache_1 | -elasticsearch-max-seconds int
monstache_1 | Number of seconds before flushing to Elasticsearch
monstache_1 | -elasticsearch-password string
monstache_1 | The elasticsearch password for basic auth
monstache_1 | -elasticsearch-pem-file string
monstach