impala java 堆内存配置_【原创】大叔问题定位分享(38)impala报错内存不足

impala有时查询报错内存不足,并持续一段时间后自动恢复,报错时日志如下:

org.apache.hive.service.cli.HiveSQLException: ExecQueryFInstances rpc query_id=834c3b2376181f0e:a901620f00000000 failed: Failed to get minimum memory reservation of 204.00 MB on daemon 192.168.0.1:22000 for query 834c3b2376181f0e:a901620f00000000 due to following error: Failed to increase reservation by 204.00 MB because it would exceed the applicable reservation limit for the "Process" ReservationTracker: reservation_limit=8.50 GB reservation=8.50 GB used_reservation=0 child_reservations=8.50 GB

The top 5 queries that allocated memory under this tracker are:

Query(404fa28252334daf:5bb8cef500000000): Reservation=974.00 MB ReservationLimit=8.00 GB OtherMemory=31.14 MB Total=1005.14 MB Peak=1.00 GB

Query(3443bd95eb37b73a:885bbe6f00000000): Reservation=784.00 MB ReservationLimit=8.00 GB OtherMemory=31.14 MB Total=815.14 MB Peak=826.21 MB

Query(894b4caf9b397385:16df941300000000): Reservation=558.00 MB ReservationLimit=8.00 GB OtherMemory=15.77 MB Total=573.77 MB Peak=595.57 MB

Query(6c42b50ef6e5ba57:42aa5de200000000): Reservation=558.00 MB ReservationLimit=8.00 GB OtherMemory=15.77 MB Total=573.77 MB Peak=595.57 MB

Query(6142bb9f76823b8a:97e6451900000000): Reservation=508.00 MB ReservationLimit=8.00 GB OtherMemory=15.77 MB Total=523.77 MB Peak=534.67 MB

Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error.

at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)

at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)

at org.apache.hive.jdbc.HivePreparedStatement.executeQuery(HivePreparedStatement.java:109)

at com.alibaba.druid.pool.DruidPooledPreparedStatement.executeQuery(DruidPooledPreparedStatement.java:227)

at com.dataone.xishaoye.backend.dataoneapi.config.impala.RoundRobinPattern.winning(RoundRobinPattern.java:56)

at com.dataone.xishaoye.backend.dataoneapi.common.util.ConvertQueryUtils.convertcustomerexecQueryMapList(ConvertQueryUtils.java:92)

at com.dataone.xishaoye.backend.dataoneapi.businessAnalysis_v2.timingTasks.RealTimeTotalYear.realTimeTotalYear(RealTimeTotalYear.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at com.dataone.xishaoye.backend.dataoneapi.config.quartz.SystemJobFactory.execute(SystemJobFactory.java:29)

at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)

可以看到有5个query正在进行导致当前query获取不到足够的资源,查看impala日志 /var/log/impalad/impalad.INFO,

这5个query执行简略过程为:

第1个

I0731 13:15:03.254120 11667 impala-server.cc:972] Registered query query_id=404fa28252334daf:5bb8cef500000000 session_id=ea422a1e110dec91:7de05e982c2b4c90

...

I0731 13:15:04.135802 11720 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=404fa28252334daf:5bb8cef500000000 refcnt=4

I0731 13:21:35.669724 22421 coordinator.cc:956] Backend completed: host=192.168.0.1:22000 remaining=3 query_id=404fa28252334daf:5bb8cef500000000

...

I0731 13:22:09.125800 11667 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=404fa28252334daf:5bb8cef500000000 refcnt=1

第2个

I0731 13:09:02.447206 22390 impala-server.cc:972] Registered query query_id=3443bd95eb37b73a:885bbe6f00000000 session_id=ff4e12e38ce19b8d:56bef858ad9def93

...

I0731 13:09:02.657199 22396 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=3443bd95eb37b73a:885bbe6f00000000 refcnt=4

I0731 13:21:30.155854 9378 coordinator.cc:956] Backend completed: host=192.168.0.1:22000 remaining=3 query_id=3443bd95eb37b73a:885bbe6f00000000

...

I0731 13:22:13.168628 22390 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=3443bd95eb37b73a:885bbe6f00000000 refcnt=1

第3个

I0731 13:14:02.140969 3886 impala-server.cc:972] Registered query query_id=894b4caf9b397385:16df941300000000 session_id=6e47d4c6c94a8e19:b6384942e957718c

...

I0731 13:14:02.598412 3927 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=894b4caf9b397385:16df941300000000 refcnt=7

I0731 13:21:18.426283 3931 krpc-data-stream-mgr.cc:293] DeregisterRecvr(): fragment_instance_id=894b4caf9b397385:16df941300000011, node=18

...

I0731 13:22:06.124980 3886 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=894b4caf9b397385:16df941300000000 refcnt=1

第4个

I0731 13:07:01.442411 20737 impala-server.cc:972] Registered query query_id=6c42b50ef6e5ba57:42aa5de200000000 session_id=ba4cab33df74b6e4:94e56665c8bf36a1

...

I0731 13:07:01.692301 20752 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6c42b50ef6e5ba57:42aa5de200000000 refcnt=7

I0731 13:21:18.492522 20755 krpc-data-stream-mgr.cc:293] DeregisterRecvr(): fragment_instance_id=6c42b50ef6e5ba57:42aa5de200000011, node=18

...

I0731 13:22:03.314045 20737 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6c42b50ef6e5ba57:42aa5de200000000 refcnt=1

第5个

I0731 13:12:02.287282 29199 impala-server.cc:972] Registered query query_id=6142bb9f76823b8a:97e6451900000000 session_id=6b497f1f4c2a1000:2fca9a9feb8a268d

...

I0731 13:12:02.876965 29214 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6142bb9f76823b8a:97e6451900000000 refcnt=7

I0731 13:21:18.524863 29219 krpc-data-stream-mgr.cc:293] DeregisterRecvr(): fragment_instance_id=6142bb9f76823b8a:97e6451900000011, node=18

...

I0731 13:22:08.481279 29199 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6142bb9f76823b8a:97e6451900000000 refcnt=1

每个query具体执行的sql都可以在所有impalad中的一台的impalad.INFO中找到,这里不再列举

发现一些规律:这5个query都持续十几分钟,而且查询过程中都有很长的空白(没有任何日志输出),都从13点21开始陆续恢复,进一步查看空白过程的日志,发现有很多报错

java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.1:42194 remote=/192.168.0.2:50010]

在impala日志查看发现异常SocketTimeoutException持续时间从13点06到13点21

# tail -400000 impalad.INFO|grep SocketTimeoutException -B 2

W0731 13:06:00.506495 31507 BlockReaderFactory.java:570] BlockReaderFactory(fileName=/part-00001-a55afd98-d28f-4493-985d-0cf1f8577cfe.c000, block=BP-692799849-192.168.0.2-1556106550

816:blk_1084633665_10894033): I/O error requesting file descriptors. Disabling domain socket DomainSocket(fd=800,path=/var/run/hdfs-sockets/dn)

Java exception follows:

java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable

...

W0731 13:21:09.401382 31508 DFSInputStream.java:704] Failed to connect to /192.168.0.1:50010 for block BP-692799849-192.168.0.2-1556106550816:blk_1084742948_11003322, add to deadNodes and continue.

Java exception follows:

java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.2:35019 remote=/192.168.0.1:50010]

192.168.0.1上的datanode在这段时间响应异常,查看datanode日志 /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-192.168.0.1.log.out

在这段时间有很多异常

2020-07-31 13:02:50,431 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1306ms

在datanode日志查看发现异常持续时间从13点02到13点21

# tail -347000 hadoop-cmf-hdfs-DATANODE-192.168.0.1.log.out.1|grep JvmPauseMonitor

2020-07-31 13:02:50,431 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1306ms

...

2020-07-31 13:21:16,102 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2399ms

调整datanode内存(Java Heap Size of DataNode in Bytes),问题解决

总结: datanode内存不足导致gc,进而导致impala查询中部分查询(用到该datanode上数据的查询)卡住,进而导致impala大量内存被占用,进而导致impala查询报错内存不足

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值