通过Monstache同步MongoDB数据到ELasticsearche

Monstache安装与配置

monstache是一个go守护进程,可将MongoDB数据实时同步到Elasticsearch。

准备工作

已准备好MongoDB4.4.6副本集环境
已准备好Elasticsearch 7环境
monstache的github源码地址:https://github.com/rwynn/monstache

monstachemongodbelashticsearch
53.6+7
63.6+8

步骤一:安装monstache环境

1.安装go,并配置环境变量
(1)下载并解压go安装包

wget https://dl.google.com/go/go1.14.4.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.14.4.linux-amd64.tar.gz

(2)使用vim /etc/profile命令打开环境变量配置文件,并将如下内容写入该文件中

export PATH=$PATH:/usr/local/go/bin

(3)应用环境变量

source /etc/profile

(4)查看go环境

[root@localhost ~]# go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://goproxy.io,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build567870495=/tmp/go-build -gno-record-gcc-switches"

(5)GOPROXY默认是国外代理,在中国无法使用,故而建议使用 goproxy.cn 作为替代

export GO111MODULE=on
export GOPROXY=https://goproxy.io.direct

2.安装Monstache
(1)进入路径

cd /usr/local

(2)下载安装包
i,通过git获取安装包,并切换版本,因为es 7对应的是mongstache 5,故要切换到rel 5分支

git clone https://github.com/rwynn/monstache.git

ii.在github下载rel 5分支压缩包并解压,通过xftp将解压好的文件放到/usr/local目下下

(3)进入monstache路径

cd monstache

(4)安装monstache

go install

(5)查看monstache版本

monstache -v

步骤二:配置实时同步任务

  1. 进入Monstache安装目录,创建并编辑配置文件
  2. 参考以下示例,修改配置文件。
    简单的配置示例如下,详细配置请参见Monstache Usage。
# settings

# connect to MongoDB using the following URL
mongo-url = "mongodb://192.168.10.134:27017/"
# connect to the Elasticsearch REST API at the following node URLs
elasticsearch-urls = ["http://192.168.2.128:9200"]

# frequently required settings

# if you need to seed an index from a collection and not just listen and sync changes events
# you can copy entire collections or views from MongoDB to Elasticsearch
direct-read-namespaces = ["lawdb.law"]
# direct-read-namespaces = ["lawdb.test","lawdb.law"]

# if you want to use MongoDB change streams instead of legacy oplog tailing use change-stream-namespaces
# change streams require at least MongoDB API 3.6+
# if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment
# in this case you usually don't need regexes in your config to filter collections unless you target the deployment.
# to listen to an entire db use only the database name.  For a deployment use an empty string.
# change-stream-namespaces = ["lawdb.law"]

# additional settings

# if you don't want to listen for changes to all collections in MongoDB but only a few
# e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection
# this setting does not initiate a copy, it is only a filter on the change event listener
namespace-regex = '^lawdb\.law$'
# compress requests to Elasticsearch
#gzip = true
# generate indexing statistics
#stats = true
# index statistics into Elasticsearch
#index-stats = true
# use the following PEM file for connections to MongoDB
#mongo-pem-file = "/path/to/mongoCert.pem"
# disable PEM validation
#mongo-validate-pem-file = false
# use the following user name for Elasticsearch basic auth
elasticsearch-user = "elastic"
# use the following password for Elasticsearch basic auth
elasticsearch-password = "elasticsearch"
# use 4 go routines concurrently pushing documents to Elasticsearch
elasticsearch-max-conns = 4
# use the following PEM file to connections to Elasticsearch
#elasticsearch-pem-file = "/path/to/elasticCert.pem"
# validate connections to Elasticsearch
#elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
#replay = false
# resume processing from a timestamp saved in a previous run
resume = true
# do not validate that progress timestamps have been saved
#resume-write-unsafe = false
# override the name under which resume state is saved
#resume-name = "default"
# use a custom resume strategy (tokens) instead of the default strategy (timestamps)
# tokens work with MongoDB API 3.6+ while timestamps work only with MongoDB API 4.0+
resume-strategy = 0
# exclude documents whose namespace matches the following pattern
#namespace-exclude-regex = '^mydb\.ignorecollection$'
# turn on indexing of GridFS file content
#index-files = true
# turn on search result highlighting of GridFS content
#file-highlighting = true
# index GridFS files inserted into the following collections
#file-namespaces = ["users.fs.files"]
# print detailed information including request traces
verbose = true
# enable clustering mode
cluster-name = 'es-cn-mp91kzb8m00******'
# do not exit after full-sync, rather continue tailing the oplog
#exit-after-direct-reads = false
[[mapping]]
namespace = "lawdb.law"
index = "newlaw_ver2"
type = "doc"

#[[mapping]]
#namespace = "lawdb.test"
#index = "newlaw_ver2"
#type = "doc"
参数说明
mongo-urlMongoDB实例的主节点访问地址。
elasticsearch-urlses实例地址
direct-read-namespaces指定待同步的集合,本文同步的数据集为lawdb数据库下的test和law集合。
namespace-regex通过正则表达式指定需要监听的集合。此设置可以用来监控符合正则表达式的集合中数据的变化。
elasticsearch-user访问ES实例的用户名,默认为elastic。
elasticsearch-password对应用户的密码。elastic用户的密码在创建实例时指定,如果忘记可进行重置,重置密码的注意事项和操作步骤请参见重置实例访问密码。
elasticsearch-max-conns定义连接ES的线程数。默认为4,即使用4个Go线程同时将数据同步到ES。
dropped-collections默认为true,表示当删除MongoDB集合时,会同时删除ES中对应的索引。
dropped-databases默认为true,表示当删除MongoDB数据库时,会同时删除ES中对应的索引。
resume默认为false。设置为true,Monstache会将已成功同步到ES的MongoDB操作的时间戳写入monstache.monstache集合中。当Monstache因为意外停止时,可通过该时间戳恢复同步任务,避免数据丢失。如果指定了cluster-name,该参数将自动开启,详情请参见resume。
resume-strategy指定恢复策略。仅当resume为true时生效,详情请参见resume-strategy。
verbose默认为false,表示不启用调试日志。
cluster-name指定集群名称。指定后,Monstache将进入高可用模式,集群名称相同的进程将进行协调,详情请参见cluster-name。
mapping指定ES索引映射。默认情况下,数据从MongoDB同步到ES时,索引会自动映射为数据库名.集合名。如果需要修改索引名称,可通过该参数设置,详情请参见Index Mapping。

3.运行monstache

monstache -f config.toml
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值