干货：monstache同步mongodb数据至elasticsearch，实现数据实时同步

最新推荐文章于 2024-09-03 08:20:20 发布

chigua1760

最新推荐文章于 2024-09-03 08:20:20 发布

阅读量2.7k

点赞数

文章标签：数据库大数据

原文链接：https://my.oschina.net/u/3734816/blog/3082208

版权

网上mongodb的数据同步工具较少，前一段时间用monstache实现了mongo到es的数据实时同步。

因为monstache是基于mongodb的oplog实现同步，而开启oplog前提是配置mongo的复制集；

开启复制集可参考：https://blog.csdn.net/jack_brandy/article/details/88887795；

接下来下载对应es，mongo版本的monstache https://rwynn.github.io/monstache-site/start/

在monstache目录下创建config.toml配置文件,内容如下：

# connection settings
# connect to MongoDB using the following URL
mongo-url = "mongodb://localhost:27017"

# connect to the Elasticsearch REST API at the following node URLs
elasticsearch-urls = ["http://localhost:9200"]

# frequently required settings
# if you don't want to listen for changes to all collections in MongoDB but only a few
# e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection
# this setting does not initiate a copy, it is a filter on the oplog change listener only
namespace-regex = '^aaa\.bbb$'      #aaa表示mongodb的数据库，bbb表示集合，表示要匹配的名字空间

# additionally, if you need to seed an index from a collection and not just listen for changes from the oplog
# you can copy entire collections or views from MongoDB to Elasticsearch
# direct-read-namespaces = ["mydb.mycollection", "db.collection", "test.test"]

# if you want to use MongoDB change streams instead of legacy oplog tailing add the following
# in this case you don't need regexes to filter collections.
# change streams require MongoDB version 3.6+
# change streams can only be combined with resume, replay, or cluster-name options on MongoDB 4+
# if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment
# to listen to an entire db use only the database name.  For a deployment use an empty string.
# change-stream-namespaces = ["mydb.mycollection", "db.collection", "test.test"]

# additional settings
# compress requests to Elasticsearch
# gzip = true
# generate indexing statistics
# stats = true
# index statistics into Elasticsearch
# index-stats = true
# use the following PEM file for connections to MongoDB
# mongo-pem-file = "/path/to/mongoCert.pem"
# disable PEM validation
# mongo-validate-pem-file = false
# use the following user name for Elasticsearch basic auth
# elasticsearch-user = "someuser"
# use the following password for Elasticsearch basic auth
# elasticsearch-password = "somepassword"
# use 4 go routines concurrently pushing documents to Elasticsearch
# elasticsearch-max-conns = 4 
# use the following PEM file to connections to Elasticsearch
# elasticsearch-pem-file = "/path/to/elasticCert.pem"
# validate connections to Elasticsearch
# elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
# replay = false
# resume processing from a timestamp saved in a previous run
resume = true #从上次同步的时间开始同步
# do not validate that progress timestamps have been saved
# resume-write-unsafe = false
# override the name under which resume state is saved
# resume-name = "default"
# exclude documents whose namespace matches the following pattern
# namespace-exclude-regex = '^mydb\.ignorecollection$'
# turn on indexing of GridFS file content
# index-files = true
# turn on search result highlighting of GridFS content
# file-highlighting = true
# index GridFS files inserted into the following collections
# file-namespaces = ["users.fs.files"]
# print detailed information including request traces
# verbose = true
# enable clustering mode
 cluster-name = 'tzg'  #es集群名
# do not exit after full-sync, rather continue tailing the oplog
# exit-after-direct-reads = false

执行命令 ./monstache -f config.toml 成功的话会看到如下界面

如果es提示queue size不足，则再es配置中添加如下内容:

thread_pool: 
    bulk: 
      queue_size: 200

转载于:https://my.oschina.net/u/3734816/blog/3082208

chigua1760

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫