ElasticSearch 实战：使用elasticdump导出导入数据

用心去追梦

于 2024-04-02 10:25:23 发布

阅读量643

点赞数 4

文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/qq_33240556/article/details/137261150

版权

elasticdump 是一个用于备份和迁移 Elasticsearch 数据的命令行工具。以下是在实践中使用 elasticdump 导出和导入数据的具体步骤：

确保您已安装 Node.js。然后，使用 npm 安装 elasticdump：

npm install -g elasticdump

假设要导出名为 my_index 的索引到本地 JSON 文件 my_index_dump.json：

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=my_index_dump.json \
  --type=data

如果只想导出索引的 mapping（结构），不包含数据，可以指定 --type=mapping：

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=my_index_mapping.json \
  --type=mapping

要导出索引的 settings，使用 --type=settings：

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=my_index_settings.json \
  --type=settings

将之前导出的 my_index_dump.json 数据文件导入到目标 Elasticsearch 环境中的空索引 my_index：

elasticdump \
  --input=my_index_dump.json \
  --output=http://target_es_host:9200/my_index \
  --type=data

确保目标索引不存在或已清空，避免数据冲突。

如果需要覆盖现有索引的数据，可以添加 --overwrite 参数：

elasticdump \
  --input=my_index_dump.json \
  --output=http://target_es_host:9200/my_index \
  --type=data \
  --overwrite

elasticdump 支持增量导入数据，但需要索引具有 _timestamp 或 _seq_no 字段，并使用 --last-modified 参数。这通常用于定期增量备份和恢复。请参阅 elasticdump 文档了解详细用法。

并发与批量：通过 --limit 参数调整每次读写操作的数据量，提高导入导出效率。同时，可以使用 --concurrency 设置并发数。
认证：如果 Elasticsearch 集群启用了安全认证，需通过 --username 和 --password 参数提供凭据，或者使用 --headers 提供 JWT 等自定义认证头。
SSL：对于使用 HTTPS 的 Elasticsearch 集群，添加 --input-ca、--output-ca 等参数指定 CA 证书，或者使用 --noVerifyCert 忽略证书验证（非生产环境）。
索引状态：确保在导入数据前，目标索引的 mapping 和 settings 与导出数据相匹配。必要时，先导入 mapping 和 settings，再导入数据。
磁盘空间：导出的大规模数据集可能占用大量磁盘空间。确保有足够的磁盘空间存放临时文件和最终的导出文件。
性能影响：大规模数据迁移可能会影响集群性能。在业务低峰期进行操作，或考虑使用快照迁移等其他方法。