【Hive】 HiveServer2 内存溢出总结

1.前言

用户使用Beeline访问HiveServer2 (3.1.2版本) 执行离线SQL任务,持续运行一周后HiveServer2就出现OOM现象,严重影响数据查询与报表产出,经过几轮修复问题终于解决。作者把修复过的问题进行了汇总,避免其他小伙伴再遇到此问题时束手无策。

2.案例

2.1 HIVE-16455

HiveServer2 在使用ADD JAR语句时导致文件句柄泄漏

[root@host-10-17-80-111 ~]# lsof -p 29588 | grep "(deleted)" | wc -l
java    29588 hive  391u   REG              252,3    125987  2099944 /tmp/57d98f5b-1e53-44e2-876b-6b4323ac24db_resources/hive-contrib.jar (deleted)
java    29588 hive  392u   REG              252,3    125987  2099946 /tmp/eb3184ad-7f15-4a77-a10d-87717ae634d1_resources/hive-contrib.jar (deleted)
java    29588 hive  393r   REG              252,3    125987  2099825 /tmp/e29dccfc-5708-4254-addb-7a8988fc0500_resources/hive-contrib.jar (deleted)
java    29588 hive  394r   REG              252,3    125987  2099833 /tmp/5153dd4a-a606-4f53-b02c-d606e7e56985_resources/hive-contrib.jar (deleted)
java    29588 hive  395r   REG              252,3    125987  2099827 /tmp/ff3cdb05-917f-43c0-830a-b293bf397a23_resources/hive-contrib.jar (deleted)
java    29588 hive  396r   REG              252,3    125987  2099822 /tmp/60531b66-5985-421e-8eb5-eeac31fdf964_resources/hive-contrib.jar (deleted)
java    29588 hive  397r   REG              252,3    125987  2099831 /tmp/78878921-455c-438c-9735-447566ed8381_resources/hive-contrib.jar (deleted)
java    29588 hive  399r   REG              252,3    125987  2099835 /tmp/0e5d7990-30cc-4248-9058-587f7f1ff211_resources/hive-contrib.jar (deleted)
2.2 HIVE-24236

不容易复现,只能某些特定条件下可能存在连接泄漏风险

2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler (TxnHandler.java:checkRetryable(3733)) - Non-retryable error in heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, general error (SQLState=null, ErrorCode=0)
2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to select from transaction database org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, general error
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)
        at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)
        at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)
        at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
        at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)
        at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
        at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)
        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)
        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2.3 HIVE-24552

调用loadDynamicPartitions(Hive.java)时生成多个线程来处理FileMove,这些线程可能会生成HiveMetaStore连接,这些连接可能没有及时关闭造成大量的连接堆积。

2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43901
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43900
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43899
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43898
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43897
2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="transport.TIOStreamTransport" level="WARN" thread="Finalizer"] Error closing output stream.
java.net.SocketException: Socket closed
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
2.4 HIVE-24858

如果在会话中注册了一个UDF JAR 并从中创建了一个临时函数,当会话关闭时UDFClassLoader不会被GC回收掉。

Class Name                                                                                                                          | Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
contextClassLoader org.apache.hive.service.server.ThreadWithGarbageCleanup @ 0x7164deb50  HiveServer2-Handler-Pool: Thread-72 Thread|          128 |        79,072
referent java.util.WeakHashMap$Entry @ 0x7164e67d0                                                                                  |           40 |           824
'- [6] java.util.WeakHashMap$Entry[16] @ 0x71581aac0                                                                                |           80 |         5,056
   '- table java.util.WeakHashMap @ 0x71580f510                                                                                     |           48 |         6,920
      '- CACHE_CLASSES class org.apache.hadoop.conf.Configuration @ 0x71580f3d8                                                     |           64 |        74,528
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
2.5 HIVE-26404

HiveMetaStore无法响应JVM垃圾回收停顿时间长,堆内存org.apache.hadoop.conf.Configuration占用过多存在OOM风险。

 Class Name                                                                             | Shallow Heap | Retained Heap
----------------------------------------------------------------------------------------------------------------------
org.apache.hadoop.fs.FileSystem$Cache @ 0x45403fe70                                    |           32 |   108,671,824
|- <class> class org.apache.hadoop.fs.FileSystem$Cache @ 0x45410c3e0                   |            8 |           544
'- map java.util.HashMap @ 0x453ffb598                                                 |           48 |    92,777,232
   |- <class> class java.util.HashMap @ 0x4520382c8 System Class                       |           40 |           168
   |- entrySet java.util.HashMap$EntrySet @ 0x454077848                                |           16 |            16
   '- table java.util.HashMap$Node[32768] @ 0x463585b68                                |      131,088 |    92,777,168
      |- class java.util.HashMap$Node[] @ 0x4520b7790                                  |            0 |             0
      '- [1786] java.util.HashMap$Node @ 0x451998ce0                                   |           32 |         9,968
         |- <class> class java.util.HashMap$Node @ 0x4520b7728 System Class            |            8 |            32
         '- value org.apache.hadoop.hdfs.DistributedFileSystem @ 0x452990178           |           56 |         4,976
            |- <class> class org.apache.hadoop.hdfs.DistributedFileSystem @ 0x45402e290|            8 |         4,664
            |- uri java.net.URI @ 0x451a05cd0  hdfs://nameservice1                     |           80 |           432
            |- dfs org.apache.hadoop.hdfs.DFSClient @ 0x451f5d9b8                      |          128 |         3,824
            '- conf org.apache.hadoop.hive.conf.HiveConf @ 0x453a34b38                 |           80 |       250,160
----------------------------------------------------------------------------------------------------------------------
2.6 HIVE-22275

单个Hive Session执行多条SQL语时OperationManager.queryIdOperation没有正常清理存在OOM风险

2019-09-13T08:37:36,785 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9]
2019-09-13T08:37:38,432 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083736_c49cf3cc-cfe8-48a1-bd22-8b924dfb0396 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9] with tag: null
2019-09-13T08:37:38,469 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb]
2019-09-13T08:37:52,662 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b]
2019-09-13T08:37:56,239 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c]
2019-09-13T08:38:30,791 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b697c801-7da0-4544-bcfa-442eb1d3bd77]
2019-09-13T08:39:10,187 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=bda93c8f-0822-4592-a61c-4701720a1a5c]
2019-09-13T08:39:15,471 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb] with tag: null
2019-09-13T08:39:15,507 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b] with tag: null
2019-09-13T08:39:15,538 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c] with tag: null
2.7 HIVE-24590

日志输出文件没有正常关闭或删除,Log4j中的RandomAccessFileManager实例占用堆内存空间过多存在OOM风险。image.png

3.总结

笔者使用HiveServer2版本为3.1.2,由于此版本内存泄漏问题较多,大家可根据上述案例进行编译修复,如遇到其他BUG或性能问题,建议多去社区看看。

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值