Ambari 操作指南 (Ambari Operations) 之六

最新推荐文章于 2024-08-07 09:04:08 发布

devalone

最新推荐文章于 2024-08-07 09:04:08 发布

阅读量5.6k

点赞数 2

分类专栏： Hadoop 文章标签： Ambari Hadoop HDFS YARN search

本文链接：https://blog.csdn.net/devalone/article/details/80854431

版权

Hadoop 专栏收录该内容

22 篇文章

订阅专栏

继 (Ambari 操作指南 (Ambari Operations) 之五 )

9.2 Ambari 日志搜索 (Ambari Log Search, Technical Preview)

下面几节描述 Ambari Log Search 的技术概览(Technical Preview), 只能在少于 150 个节点的非生产环境集群上使用。

9.2.1 Ambari 日志搜索体系结构 (Log Search Architecture)

Ambari Log Search 可以搜索由 Ambari-managed HDP 组件生成的日志。Ambari Log Search 依赖于由 Apache Solr 索引服务提供的 Ambari Infra 服务。

两个组件组成了 Log Search 解决方案：

   • Log Feeder
   • Log Search Server

9.2.1.1 Log Feeder
-----------------------------------------------------------------------------------------------------------------------------------------
Log Feeder 组件分析组件日志。Log Feeder 被部署到集群的所有节点上，并与该节点上所有的组件日志交互。启动时，Log Feeder 开始分析所有已知的
组件日志并把它们发送给 Apache Solr 实例(由 Ambari Infra 服务管理) 以进行索引。

默认情况下，只有 FATAL, ERROR, and WARN 日志被 Log Feeder 捕捉。可以利用 Log Search UI 过滤器设置来临时或永久地添加其他日志级别。

9.2.1.2 Log Search Server
-----------------------------------------------------------------------------------------------------------------------------------------
Log Search Server 承载着 Log Search UI web 应用程序，为 Ambari 提供 API, 并且 Log Search UI 访问已索引的组件日志。作为本地或 LDAP 用户登录
之后，可以利用 Log Search UI 可视化，浏览，以及搜索索引化了的组件日志。

9.2.2 Installing Log Search
-----------------------------------------------------------------------------------------------------------------------------------------
Log Search 是 Ambari 2.4 及以后版本的内置服务。可以在一个新的安装过程中通过 +Add Service 菜单安装。 Log Feeders 自动安装到集群的所有节点上

可以手动将 Log Search Server 安装到与 Ambari Server 同一部主机上。

9.2.3 使用 Log Search (Using Log Search)
-----------------------------------------------------------------------------------------------------------------------------------------
使用 Log Search 包括如下活动：

   • Accessing Log Search
   • Using Log Search to Troubleshoot
   • Viewing Service Logs
   • Viewing Access Logs

9.2.3.1 访问 Log Search (Accessing Log Search)
-----------------------------------------------------------------------------------------------------------------------------------------
Log Search 安装之后，可以利用如下三种方法搜索索引化的日志：

   • Ambari Background Ops Log Search Link
   • Host Detail Logs Tab
   • Log Search UI

9.2.3.1.1 Ambari 后台操作日志搜索链接 (Ambari Background Ops Log Search Link)
-----------------------------------------------------------------------------------------------------------------------------------------
当执行生命周期操作时，例如启动或停止服务，访问日志可以有助于从潜在的失败中恢复，这是非常重要的。这些日志在 Background Ops 中现在是可用的。
Background Ops 也链接到 Host Detail Logs tab, 列出所有的索引化的日志文件，并可以在一个主机上查看。

9.2.3.1.2 Ambari 后台操作日志搜索链接 (Ambari Background Ops Log Search Link)
-----------------------------------------------------------------------------------------------------------------------------------------
Logs tab 页添加到每一个主机的 host detail 页面，包含一个索引的列表，可查看的日志文件，通过 service, component, type 组织。可以通过一个
到 Log Search UI 的链接打开并搜索这些文件。

9.2.3.1.3 Log Search UI
-----------------------------------------------------------------------------------------------------------------------------------------
Log Search UI 是一个特定目的构建的 web 应用程序用于搜索 HDP 组件日志。这个 UI 专注于快速访问和从一个单点位置搜索日志。日志可以由日志级别，
组件，以及可以搜索的关键字过滤。

Log Search UI 可以从 Ambari Web 的 Log Search Service 的 Quick Links 访问。

9.2.3.2 利用 Log Search 进行故障处理(Using Log Search to Troubleshoot)
-----------------------------------------------------------------------------------------------------------------------------------------
要查找特定问题关联的日志，在 UI 中使用 Troubleshooting 选项卡，选择与该问题关联的服务，组件，以及时间。例如，选择 HDFS, UI 自动搜索 HDFS
相关的组件。可以选择一个昨天或上周的时间帧，或一个自定义的值。当准备好查看匹配的日志时，单击 Go to Logs:

9.2.3.3 查看服务日志 (Viewing Service Logs)
-----------------------------------------------------------------------------------------------------------------------------------------
Service Logs tab 可用于搜索横跨所有组件日志，通过关键字或特定日志级别的过滤器，组件，以及时间区间。UI 经过组织，可以快速看到每个级别日志
有多少日志捕捉到，查找关键字，包括排除的组件，匹配查询的日志。

9.2.3.4 查看访问日志 (Viewing Access Logs)
-----------------------------------------------------------------------------------------------------------------------------------------
当要处理 HDFS 相关的问题时，可以发现搜索 HDFS 用户访问趋势很有帮助。Access Logs tab 可以查看 HDFS 审计日志，聚集数据使用显示 top ten HDFS
用户，以及 top ten 文件系统资源访问。这能帮助找到异常现象，或热点和冷点数据集。

9.3 Ambari Infra
-----------------------------------------------------------------------------------------------------------------------------------------
HDP 中很多服务依赖于核心服务来索引数据。例如，Apache Atlas 利用索引服务进行 lineage-free 文本搜索，Apache Ranger 对审计数据进行索引。
Ambari Infra 的角色是为安装栈上组件提供公共索引服务。

当前， Ambari Infra Service 只有一个组件：Infra Solr Instance. Infra Solr Instance 是一个完全托管的 Apache Solr 安装。默认情况下，Ambari
Infra Service 在选择安装时，部署一个单节点的 SolrCloud 安装，但可以安装多个 Infra Solr Instances , 这样就可以有一个分布式索引并为 Atlas,
Ranger, and LogSearch 提供搜索。

要安装多个 Infra Solr Instances, 可以简单地通过 Ambari 的 +Add Service 功能把它们添加到现有的集群主机上。部署的 Infra Solr Instances 的数量
取决于集群的节点数量和部署的服务。

因为一个 Ambari Infra Solr Instance 用于多个 HDP 组件，因此在重启服务时要小心，避免扰乱这些依赖的服务。 HDP 2.5 及以后版本，Atlas, Ranger,
and Log Search 依赖于 Ambari Infra Solr Instance 。


   Note：
       Infra Solr Instance 是仅为 HDP 组件使用的，不支持第三方组件或应用程序。

9.3.1 存档和清理数据 (Archiving & Purging Data)
-----------------------------------------------------------------------------------------------------------------------------------------
大型集群会产生很多的日志内容，Ambari Infra 提供了一个便利工具用于存档和清理不再需要的日志。

工具成为 Solr Data Manager. Solr Data Manager 是一个 python 程序，安装路径为 /usr/bin/infra-solr-data-manager 。此程序使用户可以快速存档，
删除，或保存 Solr 集合的数据。

9.3.1.1 命令行选项 (Command Line Options)
-----------------------------------------------------------------------------------------------------------------------------------------

   ● 操作模式(Operation Modes)
   -------------------------------------------------------------------------------------------------------------------------------------
   -m MODE, --mode=MODE archive | delete | save

   使用的模式取决于要执行的操作：
   archive   : 用于将数据存储到存储媒体，并在存储完成之后删除数据
   delete   : 即删除
   save   : 类似于 archive, 除了数据保存后不会被删除



   ● 连接到 Solr(Connecting to Solr)
   -------------------------------------------------------------------------------------------------------------------------------------
   -s SOLR_URL, --solr-url=<SOLR_URL>

   URL 用于连接到特定的 Solr Cloud 实例
   例如，http://c6401.ambari.apache.org:8886/solr



   ● -c COLLECTION, --collection=COLLECTION
   -------------------------------------------------------------------------------------------------------------------------------------
   Solr 集合(collection) 的名称，如，‘hadoop_logs’




   ● -k SOLR_KEYTAB,--solr-keytab=SOLR_KEYTAB
   -------------------------------------------------------------------------------------------------------------------------------------
   使用的 keytab 文件，用于 kerberized Solr 实例




   ● -n SOLR_PRINCIPAL, --solr-principal=SOLR_PRINCIPAL
   -------------------------------------------------------------------------------------------------------------------------------------
   使用的 principal 名称，用于 kerberized Solr 实例




   ● Record Schema
   -------------------------------------------------------------------------------------------------------------------------------------
   -i ID_FIELD, --id-field=ID_FIELD
   solr schema 中字段名称，用于唯一标识每条记录

   -f FILTER_FIELD, --filter-field=FILTER_FIELD
   solr schema 中用于过滤掉的字段名称，如，'logtime’

   -o DATE_FORMAT, --date-format=DATE_FORMAT
   The custom date format to use with the -d DAYS field to match log entries that are older than a certain number of days.

   -e END
   Based on the filter field and date format, this argument configures the date that should be used as the end of the date range. If you
   use ‘2018-08-29T12:00:00.000Z’, then any records with a filter field that is after that date will be saved, deleted, or archived
   depending on the mode.

   -d DAYS, --days=DAYS
   Based on the filter field and date format, this argument configures the number days before today should be used as the end of the range.
   If you use ‘30’, then any records with a filter field that is older than 30 days will be saved, deleted, or archived depending on the mode.

   -q ADDITIONAL_FILTER, --additional-filter=ADDITIONAL_FILTER
   Any additional filter criteria to use to match records in the collection

   ● Extracting Records
   -------------------------------------------------------------------------------------------------------------------------------------
   -r READ_BLOCK_SIZE, --read-block-size=READ_BLOCK_SIZE
   The number of records to read at a time from Solr. For example: ‘10’ to read 10 records at a time.

   -w WRITE_BLOCK_SIZE, --write-block-size=WRITE_BLOCK_SIZE
   The number of records to write per output file. For example: ‘100’ to write 100 records per file.

   -j NAME, --name=NAME name included in result files
   Additional name to add to the final filename created in save or archive mode.

   --json-file
   Default output format is one valid json document per record delimited by a newline. This option will write out a single valid JSON
   document containing all of the records.


   -z COMPRESSION, --compression=COMPRESSION none | tar.gz | tar.bz2 | zip | gz
   Depending on how output files will be analyzed, you have the choice to choose the optimal compression and file format to use for output
   files. Gzip compression is used by default.




   ● Writing Data to HDFS
   -------------------------------------------------------------------------------------------------------------------------------------

   -a HDFS_KEYTAB, --hdfs-keytab=HDFS_KEYTAB
   The keytab file to use when writing data to a kerberized HDFS instance.


   -l HDFS_PRINCIPAL, --hdfs-principal=HDFS_PRINCIPAL
   The principal name to use when writing data to a kerberized HDFS instance


   -u HDFS_USER, --hdfs-user=HDFS_USER
   The user to connect to HDFS as


   -p HDFS_PATH, --hdfs-path=HDFS_PATH
   The path in HDFS to write data to in save or archive mode.





   ● Writing Data to S3
   -------------------------------------------------------------------------------------------------------------------------------------
   -t KEY_FILE_PATH, --key-file-path=KEY_FILE_PATH
   The path to the file on the local file system that contains the AWS Access and Secret Keys. The file should contain the keys in this
   format: <accessKey>,<secretKey>


   -b BUCKET, --bucket=BUCKET
   The name of the bucket that data should be uploaded to in save or archive mode.


   -y KEY_PREFIX, --key-prefix=KEY_PREFIX
   The key prefix allows you to create a logical grouping of the objects in an S3 bucket. The prefix value is similar to a directory name
   enabling you to store data in the same directory in a bucket. For example, if your Amazon S3 bucket name is logs, and you set prefix
   to hadoop/, and the file on your storage device is hadoop_logs_-_2017-10-28T01_25_40.693Z.json.gz, then the file would be identified
   by this URL: http://s3.amazonaws.com/logs/hadoop/hadoop_logs_-_2017-10-28T01_25_40.693Z.json.gz


   -g, --ignore-unfinished-uploading
   To deal with connectivity issues, uploading extracted data can be retried. If you do not wish to resume uploads, use the -g flag to
   disable this behaviour.




   ● Writing Data Locally
   -------------------------------------------------------------------------------------------------------------------------------------
   -x LOCAL_PATH, --local-path=LOCAL_PATH
   The path on the local file system that should be used to write data to in save or archive mode




   ● 示例
   -------------------------------------------------------------------------------------------------------------------------------------


   □ 删除索引的数据 (Deleting Indexed Data)：

   delete 模式 (-m delete), 程序从 Solr collection 中删除数据。这个模式利用过滤器字段(-f FITLER_FIELD) 选项来控制哪些数据从索引中删除。
   下面的命令会从 hadoop_logs collection 中删除日志项，August 29, 2017 以前创建的，使用 -f 选项指定的 Solr collection 字段作为过滤器字段，
   -e 选项标识要删除的区间结尾

   infra-solr-data-manager -m delete -s ://c6401.ambari.apache.org:8886/solr -c hadoop_logs -f logtime -e 2017-08-29T12:00:00.000Z


   □ 存档索引数据 (Archiving Indexed Data)

   archive 模式，程序从 Solr collection 中获取数据并写出到 HDFS 或 S3, 然后删除数据。

   程序会从 Solr 抓取数据并在达到写入块大小，或 Solr 中没有匹配的数据时创建文件。程序跟踪抓取记录的进度，由过滤字段和 id 字段排序，并且
   总是会保存它们最后的值。一旦文件写入，利用配置的压缩类型对其进行压缩。

   压缩的文件创建之后，程序创建一个命令文件包含下一步的指导。在下一步操作期间遇到任何中断或错误，程序会启动保存的命令文件，因此所有数据会
   是一致的。如果无效的配置导致错误，一致性失败， -g 选项可用于忽略保存的命令文件。程序支持将数据写入到 HDFS, S3, 或本地文件。

   下面的命令会从 http://c6401.ambari.apache.org:8886/solr 访问 solr collection hadoop_logs, 基于字段的 logtime, 并抽取出每过 1 天，一次
   读取 10 个文档，写出 100 个文档到一个文件，并复制这些 zip 文件到本地 /tmp 目录。


   infra-solr-data-manager -m archive -s http://c6401.ambari.apache.org:8886/solr -c hadoop_logs -f logtime -d 1 -r 10 -w 100 -x /tmp -v



   □ 保存索引数据 (Saving Indexed Data)
   -------------------------------------------------------------------------------------------------------------------------------------
   保存数据类似于存档数据，除了文件创建和上传之后不会被删除之外。建议在运行存档模式之前使用 save 模式测试，数据按预期的方式写入。

   一下命令会存储最后 3 天的 HDFS 审计日志到 HDFS 路径 "/" hdfs 用户，从 kerberized Solr 抓取数据。

   infra-solr-data-manager -m save -s http://c6401.ambari.apache.org:8886/solr -c audit_logs -f logtime -d 3 -r 10 -w 100
   -q type:\”hdfs_audit\” -j hdfs_audit -k /etc/security/keytabs/ambari-infra-solr.service.keytab -n
   infra-solr/c6401.ambari.apache.org@AMBARI.APACHE.ORG -u hdfs -p /





9.3.2 Ambari Infra 性能调优 (Performance Tuning for Ambari Infra)
-----------------------------------------------------------------------------------------------------------------------------------------
利用 Ambari Infra 索引和存储 Ranger 审计日志时，应正确调整 Solr 来处理每日的审计日志存储的数量。下面几节描述调整操作系统和 Solr 的建议，
基于在环境中如何利用 Ambari Infra 和 Ranger




9.3.2.1 操作系统调优 (Operating System Tuning)
-----------------------------------------------------------------------------------------------------------------------------------------
Solr 在建立索引和搜索时需要使用很多的网络连接，为了避免打开过多的网络连接，建议如下 sysctl 参数：

   net.ipv4.tcp_max_tw_buckets = 1440000
   net.ipv4.tcp_tw_recycle = 1
   net.ipv4.tcp_tw_reuse = 1

这些设置可以永久性设置在 /etc/sysctl.d/net.conf 文件中，或者运行时使用如下 sysctl 命令设置：

   sysctl -w net.ipv4.tcp_max_tw_buckets=1440000
   sysctl -w net.ipv4.tcp_tw_recycle=1
   sysctl -w net.ipv4.tcp_tw_reuse=1

另外，应该提升 solr 的用户进程数量以避免创建纯新线程异常。这可以通过创建一个名称为 etc/security/limits.d/infra-solr.conf 新文件实现，其中
包含如下内容：

   infra-solr - nproc 6000

9.3.2.2 设置 JVM - GC (JVM - GC Settings)
-----------------------------------------------------------------------------------------------------------------------------------------
堆大小和垃圾回收设置对于生成环境索引很多的 Ranger 审计日志的 Solr 实例非常重要。对于生产环境的部署，建议设置 “Infra Solr Minimum Heap Size,”
和 “Infra Solr Maximum Heap Size” 为 12 GB. 这些设置可以通过如下步骤实现：

   ① 在 Ambari Web 中，浏览到 Services > Ambari Infra > Configs
   ② 在 Settings tab, 可以看到有两个滑动条控制 Infra Solr Heap Size
   ③ 设置 Infra Solr Minimum Heap Size 为 12GB 或 12,288MB
   ④ 设置 Infra Solr Maximum Heap Size 为 12GB 或 12,288MB
   ⑤ 单击 Save 保存配置，然后按照 Ambari 提示重启相关服务。

   在生产环境部署中使用 G1 作为垃圾回收机制也是推荐的设置。要为 Ambari Infra Solr 实例设置 G1 垃圾回收，通过如下步骤实现：

   ① 在 Ambari Web 中，浏览到 Services > Ambari Infra > Configs
   ② 在 Advanced tab 展开 Advanced infra-solr-env
   ③ 在 infra-solr-env template 定位到多路 GC_TUNE 环境变量定义，以如下内容替换：

       GC_TUNE="-XX:+UseG1GC
           -XX:+PerfDisableSharedMem
           -XX:+ParallelRefProcEnabled
           -XX:G1HeapRegionSize=4m
           -XX:MaxGCPauseMillis=250
           -XX:InitiatingHeapOccupancyPercent=75
           -XX:+UseLargePages
           -XX:+AggressiveOpts"

   用于 -XX:G1HeapRegionSize 的值是基于 12GB Solr Maximum Heap Size. 如果为 Solr 选择使用不同的堆大小, 参考下表建议：

           +-----------------------+---------------------------+
           | Heap Size               |   G1HeapRegionSize       |
           +-----------------------+---------------------------+
           | < 4GB                   | 1MB                       |
           +-----------------------+---------------------------+
           | 4-8GB                   | 2MB                       |
           +-----------------------+---------------------------+
           | 8-16GB               | 4MB                       |
           +-----------------------+---------------------------+
           | 16-32GB               | 8MB                       |
           +-----------------------+---------------------------+
           | 32-64GB               | 16MB                       |
           +-----------------------+---------------------------+
           | >64GB                   | 32MB                       |
           +-----------------------+---------------------------+



9.3.2.3 环境特定的调节参数 (Environment-Specific Tuning Parameters)
-----------------------------------------------------------------------------------------------------------------------------------------
下面的每个建议都依赖于每日索引的审计记录的数量。快速确定每日建立索引的审计记录数量，利用如下命令：

使用一个 HTTP client 例如 curl, 执行下列命令：

   curl -g "http://<ambari infra hostname>:8886/solr/ranger_audits/select?q=(evtTime:[NOW-7DAYS+TO+*])&wt=json&indent=true&rows=0"

会收到类似如下的消息：

   {
       "responseHeader":{
       "status":0,
       "QTime":1,
       "params":{
       "q":"evtTime:[NOW-7DAYS TO *]",
       "indent":"true",
       "rows":"0",
       "wt":"json"}},
       "response":{"numFound":306,"start":0,"docs":[]
   }}

   利用 response 的 numFound 元素值除以 7 获得每天索引的审计日志数量。如果必要，也可以替换 curl 请求中的 ‘7DAYS’ 为一个更宽泛的时间区间，
   可以使用下列关键字：

       • 1MONTHS
       • 7DAYS

   如果改变查询的时间区间，确保除以合适的数值。每日的平均记录数用于识别如下建议的应用环境。




   ● Less Than 50 Million Audit Records Per Day
   -------------------------------------------------------------------------------------------------------------------------------------
   基于 Solr REST API 调用，如果平均每日记录数少于 50 million, 应用如下建议。在每个建议中，time to live, or TTL 控制一个文档被保持在索引
   中多长时间被移除需要考虑进去。默认 TTL 为 90 days, 但有些用户选择更激进些，从索引移除文档定为 30 days. 由于这个原因，对这两种 TTL 设置
   提供建议。

   这些建议假设使用我们推荐的每个 Solr server 实例使用 12GB 堆大小。

   Default Time To Live (TTL) 90 days:

   • Estimated total index size: ~150 GB to 450 GB
   • Total number of primary/leader shards: 6
   • Total number of shards including 1 replica each: 12
   • Total number of co-located Solr nodes: ~3 nodes, up to 2 shards per node(does not include replicas)
   • Total number of dedicated Solr nodes: ~1 node, up to 12 shards per node(does not include replicas)



   ● 50 - 100 Million Audit Records Per Day
   -------------------------------------------------------------------------------------------------------------------------------------
   50 to 100 million records ~ 5 - 10 GB data per day.

   Default Time To Live (TTL) 90 days:
   • Estimated total index size: ~ 450 - 900 GB for 90 days
   • Total number of primary/leader shards: 18-36
   • Total number of shards including 1 replica each: 36-72
   • Total number of co-located Solr nodes: ~9-18 nodes, up to 2 shards per node(does not include replicas)
   • Total number of dedicated Solr nodes: ~3-6 nodes, up   to 12 shards per node(does not include replicas)

   Custom Time To Live (TTL) 30 days:
   • Estimated total index size: 150 - 300 GB for 30 days
   • Total number of primary/leader shards: 6-12
   • Total number of shards including 1 replica each: 12-24
   • Total number of co-located Solr nodes: ~3-6 nodes, up to 2 shards per node(does not include replicas)
   • Total number of dedicated Solr nodes: ~1-2 nodes, up to 12 shards per node(does not include replicas)

   ● 100 - 200 Million Audit Records Per Day
   -------------------------------------------------------------------------------------------------------------------------------------
   100 to 200 million records ~ 10 - 20 GB data per day.

   Default Time To Live (TTL) 90 days:
   • Estimated total index size: ~ 900 - 1800 GB for 90 days
   • Total number of primary/leader shards: 36-72
   • Total number of shards including 1 replica each:   72-144
   • Total number of co-located Solr nodes: ~18-36 nodes, up to 2 shards per node(does not include replicas)
   • Total number of dedicated Solr nodes: ~3-6 nodes, up to 12 shards per node (does not include replicas)

   Custom Time To Live (TTL) 30 days:
   • Estimated total index size: 300 - 600 GB for 30 days
   • Total number of primary/leader shards: 12-24
   • Total number of shards including 1 replica each: 24-48
   • Total number of co-located Solr nodes: ~6-12 nodes, up to 2 shards per node(does not include replicas)
   • Total number of dedicated Solr nodes: ~1-3 nodes, up to 12 shards per node(does not include replicas)

如果选择使用至少 1 个副本来提供可用性，提升节点数量。如果要求高可用性，考虑配置中使用不小于 3 的 Solr 节点。
如例子中演示的，较低的 TTL 要求较少的资源。如果要长期保留数据，可以利用 SolrDataManager 将数据存档到长期存储系统(HDFS, S3), 并提供 Hive 表以
提供容易的数据查询。这种策略下，热点数据可以存储在 Solr 中以提供 Ranger UI 的快速访问，不活跃的数据存档到 HDFS 或 S3, 可以通过 Ranger 访问。

9.3.2.4 添加新的 Shards (Adding New Shards)
-----------------------------------------------------------------------------------------------------------------------------------------
如果查看以上建议之后，需要添加额外的 shards 到现有部署，参考如下 Solr 文档帮助理解如何完成这一任务：

   https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf




9.3.2.5 内存溢出异常 (Out of Memory Exceptions)
-----------------------------------------------------------------------------------------------------------------------------------------
当利用 Ambari Infra 和 Ranger Audit 一起使用时，如果看到很多 Solr 实例以 Java “Out Of Memory” 异常退出，一个解决方案是通过启用 DocValues
来升级 Ranger Audit schema 使用更少的堆内存。这样修改要求重新对数据建立索引而且具有破坏性，但非常有助于处理内存消耗。参考文章：

   https://community.hortonworks.com/articles/156933/restore-backup-ranger-audits-to-newly-collection.html