一,ingest节点的作用
ingest节点在文档写入前进行指定的预处理,类似大数据的ETL
二,ingest的使用
默认集群每个节点都具有ingest的作用,通常建议配置具有单一职责的ingest节点。
确定ingest节点后,要定义pipeline,pipeline中指定具体的逻辑。
三,ingest的使用总结
1,定义pipeline
PUT _ingest/pipeline/indexed_at
{
"processors": [
{
"script": {
"source": "ctx.indexed_at = new Date()"
}
}
]
}
PUT _ingest/pipeline/update_pipeline
{
"processors": [
{
"script": {
"source": """
if(ctx.indexed_at != null) {
ctx.indexed_at2 = new Date()
}
"""
}
}
]
}
PUT _ingest/pipeline/reindex_ingest
{
"processors": [
{
"script": {
"source": """
ctx.index = new Random().nextInt()
"""
}
}
]
}
2,使用方式
- 定义索引时指定pipeline
PUT index_ingest
{
"settings": {
"index.default_pipeline": "indexed_at"
}
}
- 写入文档时指定pipeline
POST /index_ingest/_doc/1?pipeline=indexed_at
{
"content":"ingest test"
}
- update时使用pipeline
POST index_ingest/_update_by_query?pipeline=update_pipeline
{
"query": {
"match_all": {}
}
}
- reindex时使用pipeline
POST _reindex
{
"source": {
"index": "index_ingest"
},
"dest": {
"index": "index_ingest_new",
"pipeline": "reindex_ingest"
}
}
四,使用ES内置的pipeline processer
ES内置28中processer,功能强大,使用简单。
如:将文档中的json字符串转换为json对象。
PUT _ingest/pipeline/json_pipeline
{
"processors": [
{
"json": {
"field": "user_info",
"target_field": "user_info_json"
}
}
]
}
DELETE users
POST users/_doc/1
{
"user_info":[
"{ \"password\": \"test\",\n \"username\": \"zy\"}"
]
}
POST users/_update_by_query?pipeline=json_pipeline
{
"query": {
"match_all": {}
}
}
GET users/_search