由于索引mapping有了新的改动,一般线上索引库会使用新的mapping配置新建一个索引,然后把索引别名指向新的索引。
1、新建索引
PUT /tax_law_clause_library_v5
{
"settings": {
"analysis": {
"filter": {
"by_tfr": {
"type": "stop",
"stopwords": [
" "
]
},
"by_tfr_nbsp": {
"type": "stop",
"stopwords": [
" ",
" "
]
},
"local_synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym_v1.txt"
}
},
"analyzer": {
"html_analyze": {
"filter": [
"by_tfr_nbsp",
"local_synonym"
],
"char_filter": [
"my_char_filter",
"by_cfr"
],
"type": "custom",
"tokenizer": "ik_max_word"
},
"plain_analyze": {
"filter": [
"by_tfr",
"local_synonym"
],
"char_filter": [
"by_cfr"
],
"type": "custom",
"tokenizer": "ik_max_word"
},
"comma_analyze": {
"type": "pattern",
"pattern": ","
}
},
"char_filter": {
"my_char_filter": {
"escaped_tags": [],
"type": "html_strip"
},
"by_cfr": {
"type": "mapping",
"mappings": [
"| => |"
]
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"keywords": {
"type": "text",
"analyzer": "html_analyze",
"search_analyzer": "html_analyze"
},
"taxCode": {
"type": "text",
"analyzer": "html_analyze"
},
"taxCodeDisposed": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "ik_max_word"
},
"title": {
"type": "text",
"analyzer": "html_analyze",
"search_analyzer": "html_analyze"
},
"pushDate": {
"type": "date",
"format": "yyyy-MM-dd"
},
"pushOffice": {
"type": "text",
"analyzer": "html_analyze"
},
"whetherValid": {
"type": "text",
"analyzer": "html_analyze"
},
"takeEffectDate": {
"type": "text",
"analyzer": "html_analyze"
},
"plainText": {
"type": "text",
"analyzer": "html_analyze",
"search_analyzer": "html_analyze"
},
"fullText": {
"type": "text",
"analyzer": "html_analyze",
"search_analyzer": "html_analyze"
},
"useAreaOnePartNames": {
"type": "text",
"analyzer": "html_analyze"
},
"industryLargeClassNames": {
"type": "text",
"analyzer": "html_analyze"
},
"industryTotalClassNames": {
"type": "text",
"analyzer": "html_analyze"
},
"taxesType": {
"type": "text",
"analyzer": "html_analyze"
},
"belongCompany": {
"type": "text",
"analyzer": "html_analyze"
},
"belongLabel": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "comma_analyze"
},
"remark": {
"type": "text",
"analyzer": "html_analyze"
},
"insertTime": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd'T'HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss||epoch_millis"
},
"clickNum": {
"type": "integer"
}
}
}
}
2、导入数据
将旧索引中的数据导入到新索引中
POST _reindex
{
"source": {
"index": "tax_law_clause_library_v4"
},
"dest": {
"index": "tax_law_clause_library_v5",
"op_type": "create"
}
}
此时,报错了:Gateway Time-out
{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}
那就重新reindex进行重试,结果又报了如下的错误:version conflict, document already exists (current version [1])
"failures": [
{
"index": "tax_law_clause_library_v6",
"type": "_doc",
"id": "47484",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[47484]: version conflict, document already exists (current version [1])",
"index_uuid": "neZu6lOjQB6hFF_e-ihAVw",
"shard": "0",
"index": "tax_law_clause_library_v6"
},
"status": 409
},
{
"index": "tax_law_clause_library_v6",
"type": "_doc",
"id": "47485",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[47485]: version conflict, document already exists (current version [1])",
"index_uuid": "neZu6lOjQB6hFF_e-ihAVw",
"shard": "0",
"index": "tax_law_clause_library_v6"
},
"status": 409
}
}
这是因为当时资源不足导致任务启动了推测执行机制,导致执行慢的task在另外的节点上重新启动了,并更新了同一条记录。目前我们的es表都是带主键更新的,所以会产生锁,这个同时更新会导致报错。
于是只能删除掉重新来:
DELETE /tax_law_clause_library_v5
然后执行第1步操作,重新新建索引。然后查阅资料通过两个方面来解决:
1)修改新建索引配置
PUT /tax_law_clause_library_v5/_settings
{
"number_of_replicas": 0,
"refresh_interval": -1
}
- 禁用副本:这个很好理解,如果要进行大批量导入,复制数据的同时,还要处理副本问题,增加资源消耗,所以干脆禁用掉,等数据处理完成后在还原即可;
- 修改refresh间隔:如果你的搜索结果不需要接近实时的准确性,考虑先不要急于索引刷新refresh。默认值是1s,在做reindex时可以将每个索引的refresh_interval到30s或禁用(-1)。
如果正在进行大量数据导入,reindex就是此场景,先将此值设置为-1来禁用刷新。完成后再重置回需要的值!
2)修改reindex参数
POST _reindex?slices=auto&refresh&wait_for_completion=false
{
"source": {
"index": "tax_law_clause_library_v4"
},
"dest": {
"index": "tax_law_clause_library_v5",
"op_type": "create"
}
}
slices大小设置注意事项:
1)slices大小的设置可以手动指定,或者设置slices设置为auto,auto的含义是:针对单索引,slices大小=分片数;针对多索引,slices=分片的最小值。
2)当slices的数量等于索引中的分片数量时,查询性能最高效。slices大小大于分片数,非但不会提升效率,反而会增加开销。
3)如果这个slices数字很大(例如500),建议选择一个较低的数字,因为过大的slices 会影响性能。
执行这个命令后,会返回一个task的ID,可使用下面这个命令查看:
GET _tasks/a9Aa_I_ZSl-4bjR5vZLnSA:247906
3、修改别名指向
将索别名从旧索引指向新索引
POST /_aliases
{
"actions": [
{
"remove": {
"index": "tax_law_clause_library_v4",
"alias": "tax_law_clause_library_alias"
}
},
{
"add": {
"index": "tax_law_clause_library_v5",
"alias": "tax_law_clause_library_alias"
}
}
]
}
查看索引指向:
GET /*/_alias/tax_law_clause_library_alias
{
"tax_law_clause_library_v5" : {
"aliases" : {
"tax_law_clause_library_alias" : { }
}
}
}
可以看到,已经指到新的索引了!
4、还原新索引配置
PUT /tax_law_clause_library_v5/_settings
{
"number_of_replicas": 1,
"refresh_interval": null
}
到此,大功告成!