第1.2 章 elasticsearch reindex

最新推荐文章于 2024-09-21 09:21:57 发布

warrah

最新推荐文章于 2024-09-21 09:21:57 发布

阅读量1.8k

点赞数 1

分类专栏：岁月云——大数据杂烩文章标签： elasticsearch

本文链接：https://blog.csdn.net/warrah/article/details/83068183

版权

岁月云——大数据杂烩专栏收录该内容

72 篇文章 3 订阅

订阅专栏

因为之前es的引用不是通过索引别名来操作，导致后续有很多麻烦，如果通过python脚本来写，elasticsearch删除字段在这篇文章中已经讲过了。今天找到一种方法reindex。
reindex作用还是挺大的，es版本更新比较快，es跨版本迁移就可以通过reindex来实现
这个连新索引都不用建立

POST _reindex
{
  "source":{"index":"edata"},
  "dest":{"index":"edata_v3"}
}

不用重新建立索引的前提，是没有date等特殊类型，如果都是文本的话上面的操作很简单就复制过去了，如果有date，按照上述操作则会提示.在网上查询Fielddata is disabled ，文章中说明的是修改字段类型来做，

Discover: Fielddata is disabled on text fields by default. Set fielddata=true on [createTime] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword …

而是实际上那种做法是行不通的。比如下面的，如果索引下面有多个类型，那么mappings都要复制过来，防止遗漏。

PUT ww_v2
{
  "mappings": {
      "bwKnowledge": {
        "properties": {
          "answer": {
            "type": "keyword"
          },
          "createTime": {
            "type": "date",
            "format": "yyy-MM-dd HH:mm:ss"
          },
          "id": {
            "type": "text"
          },
          "isTop": {
            "type": "text",
            "fielddata": true
          },
          "label": {
            "type": "text"
          },
          "question": {
            "type": "keyword"
          },
          "status": {
            "type": "text"
          }
        }
      }
    }
}

执行完毕后，按照 elasticsearch删除字段操作重新指定索引引用，确认索引指定安全后，再将旧的索引删掉，防止出现重复数据。这也是生产环境操作不了解es，造成血的教训。敬畏之心不可不有，记录下来，防止其他人也遭遇同样的问题。
所以正确的做法，还是先创建新的索引，然后再执行reindex。

使用kibana控制执行后结果返回,但事实数据已经复制到新的索引中了。可能是因为复制速度慢，导致超时的把。我验证的数据量是33w。

{
  "statusCode": 504,
  "error": "Gateway Timeout",
  "message": "Client request timeout"
}

正常的响应应该是

{
  "took": 476,
  "timed_out": false,
  "total": 1433,
  "updated": 0,
  "created": 1433,
  "deleted": 0,
  "batches": 2,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

因为在kibana.yml中这个请求时间默认为30s，到了30s就会超时，客户端请求超时，不带来服务端程序不再执行，所以并不影响结果。

elasticsearch.requestTimeout: 30000

于是我在网上在找了找，提升reindex效率的方案，参考Elasticsearch Reindex性能提升10倍+实战
1 调整批量写入数量
size设置为5000容易理解，但是"routing":"=cat"就不明白是怎么回事了，查看Elasticsearch 5.x Document Reindex，routing不能随便用，Elasticsearch的路由（Routing）特性

POST _reindex
{
  "source":{"index":"edata_new","size":5000},
  "dest":{"index":"edata_v5","routing":"=cat"}
}

再查看结果,多出来下图红色方案的字段

看来不需要只需要下面的操作即可

POST _reindex
{
  "source":{"index":"edata_new","size":5000},
  "dest":{"index":"edata_v5"}
}

2 分片写入
研究中。。。

warrah

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录