目录
2. 设置pipeline的source,transform,sink
Transporter 安装
transporter 是通过go语言编写的,所以先要安装Go。以下通过windows进行举例。
Go安装
- 去https://golang.org/dl/下载Go的安装包, 如go1.10.3.windows-amd64.zip。
- 解压下载的安装包go1.10.3.windows-amd64.zip。
- 设置Go的系统环境变量,GO_HOME=E:\softwares\go, 具体路径根据安装路径而定。
- 在安装路径下新建pkg,bin, src目录, 一般解压之后就已经存在了。
- 在src目录下新建目录:Go安装路径\src\github.com\compose。如E:\softwares\go\src\github.com\compose
C:\Users\flypig>go version
go version go1.10.3 windows/amd64
执行go version 命令查看是否设置正确,出现上图则Go安装成功。
Git安装
若机器上已经安装git,则忽略;没安装则安装git。git下载:https://git-for-windows.github.io/
Clone transporter源码
进入刚刚建立的路径:E:\softwares\go\src\github.com\compose 拉取transporter源码。
git clone https://github.com/compose/transporter
Build transporter源码
transporter源码拉下来之后进入 transporter目录,E:\softwares\go\src\github.com\compose\transporter
go get -a ./...
go build ./cmd/transporter
build之后,在当前目录将出现transporter.exe文件。
到此,transporter就安装好了。
准备MongoDB 数据
在MongoDB中, 创建一个测试collection: testcol。
插入几条数据至testcol, 如:
/* 1 */
{
"_id" : ObjectId("5b47061b7228335bdf97ba88"),
"firstName" : "Robert",
"lastName" : "Baratheon"
}
/* 2 */
{
"_id" : ObjectId("5b4706327228335bdf97ba89"),
"firstName" : "John",
"lastName" : "Snow"
}
/* 3 */
{
"_id" : ObjectId("5b4745d97228335bdf97ba8a"),
"firstName" : "张三",
"lastName" : "李四"
}
接下来,将要通过transporter 将testcol数据迁移至elasticsearch。
建立elasticsearch索引
transporter在迁移时会自动为testcol建立默认的索引,一般不建议这么做,所以先在elasticsearch自定义为testcol创建索引。put http://localhost:9200/index_gltestcol_v1/, aliases用于设置索引index_gltestcol_v1的别名为index_testcol。设置了3个分片,每个分片1个副本。mapping为testcol设置索引映射,其中,为firstName设置了ik_max_word中文分词器,若elasticsearch没有安装ik插件,则改成你想要的即可。此处的索引可根据实际情况而定,testcol只是个例子。
{
"aliases": {
"index_testcol": {
}
},
"settings": {
"index": {
"refresh_interval": "30s",
"number_of_shards": "3",
"number_of_replicas": "1"
}
},
"mappings": {
"testcol": {
"properties": {
"id": {
"type": "string"
},
"firstName": {
"type": "string",
"store": "true",
"analyzer": "ik_max_word"
},
"lastName": {
"type": "string",
"index": "not_analyzed",
"store": true
}
}
}
}
}
索引建立好之后,可通过http://localhost:9200/index_gltestcol_v1?pretty进行查看。
配置Transporter
1. 初始化基本的pipeline
进入E:\softwares\go\src\github.com\compose\transporter目录,执行transporter init mongodb elasticsearch,执行之后,当前目录出现pipeline.js配置文件。
transporter init mongodb elasticsearch
pipeline.js:
var source = mongodb({
"uri": "${MONGODB_URI}"
// "timeout": "30s",
// "tail": false,
// "ssl": false,
// "cacerts": ["/path/to/cert.pem"],
// "wc": 1,
// "fsync": false,
// "bulk": false,
// "collection_filters": "{}",
// "read_preference": "Primary"
})
var sink = elasticsearch({
"uri": "${ELASTICSEARCH_URI}"
// "timeout": "10s", // defaults to 30s
// "aws_access_key": "ABCDEF", // used for signing requests to AWS Elasticsearch service
// "aws_access_secret": "ABCDEF" // used for signing requests to AWS Elasticsearch service
// "parent_id": "elastic_parent" // defaults to "elastic_parent" parent identifier for Elasticsearch
})
t.Source("source", source, "/.*/").Save("sink", sink, "/.*/")
2. 设置pipeline的source,transform,sink
var source = mongodb({
"uri": "mongodb://user:pasword$@host:30011/mytestdb"
// "timeout": "30s",
// "tail": false,
// "ssl": false,
// "cacerts": ["/path/to/cert.pem"],
// "wc": 1,
// "fsync": false,
// "bulk": false,
"collection_filters": "{\"testcol\":{\"firstName\":\"Robert\"}}",
// "read_preference": "Primary"
})
var sink = elasticsearch({
"uri": "http://localhost:9200/index_testcol"
// "timeout": "10s", // defaults to 30s
// "aws_access_key": "ABCDEF", // used for signing requests to AWS Elasticsearch service
// "aws_access_secret": "ABCDEF" // used for signing requests to AWS Elasticsearch service
})
//t.Source(source).Save(sink)
// t.Source("source", source).Save("sink", sink)
//source的MongoDB collection 名要和 索引的type名称一样, 而且是将gl_topic*匹配的表都导进去了,没法指定精确的某个表迁移至ES
t.Source("source", source, "/^testcol$/").Transform(goja({"filename":"mytransform/addfullname.js"})).Save("sink", sink, "/^testcol$/")
collection_filters 是个过滤器,如只迁移firstName=“Robert”的,可设置为"collection_filters": "{\"testcol\":{\"firstName\":\"Robert\"}}"。MongoDB作为source,elasticsearch作为sink。index_testcol此处采用别名,方便变更elasticsearch索引指向新的索引。Transform(goja({"filename":"mytransform/addfullname.js"})) 为transformer配置,通过native transformer goja执行JavaScript脚本addfullname.js。addfullname.js如下:
function transform(doc) {
//doc._id = doc._id['$oid'];
doc["data"]["fullName"] = doc["data"]["firstName"] + " " + doc["data"]["lastName"];
return doc;
}
新增fullName字段,其值为firstName 和lastName通过空格连接。
3.测试连接
执行transporter test pipeline-test.js 测试连接是否正常。
e:\softwares\go\src\github.com\compose\transporter>transporter test pipeline-test.js
Transporter:
- Source: source mongodb ^testcol$
- Sink: sink elasticsearch ^testcol$
4. 迁移数据
执行transporter run pipeline-test.js启动数据迁移。
e:\softwares\go\src\github.com\compose\transporter>transporter run pipeline-test.js
[36mINFO[0m[0000] adaptor Listening... [36mname[0m=sink [36mpath[0m=source/sink [36mtype[0m=elasticsearch
[36mINFO[0m[0000] starting with metadata map[] [36mname[0m=source [36mpath[0m=source [36mtype[0m=mongodb
[36mINFO[0m[0000] boot map[sink:elasticsearch source:mongodb] [36mts[0m=1533038590485455300
[36mINFO[0m[0000] adaptor Starting... [36mname[0m=source [36mpath[0m=source [36mtype[0m=mongodb
...
[36mINFO[0m[0000] metrics source records: 1 [36mpath[0m=source [36mts[0m=1533038591316015900
[36mINFO[0m[0000] metrics source/sink records: 1 [36mpath[0m=source/sink [36mts[0m=1533038591316015900
[36mINFO[0m[0000] exit map[source:mongodb sink:elasticsearch] [36mts[0m=1533038591316015900
records:1 表明命中一条记录,并执行迁移。
执行没有报错,则可以去elasticsearch查看是否成功迁移。
其中 fullName为新增字段。当然除了增加字段,transformer还可以执行很多事情,如omit,pick,skip等,可以参考前一篇博客关于transporter的介绍https://blog.csdn.net/zhujq_icode/article/details/81297388。
参考文档
1. https://blog.csdn.net/zhujq_icode/article/details/81297388
2. https://github.com/compose/transporter/blob/master/READMEWINDOWS.md