hive占用CPU100%2

今天又再次出现了CPU100%的情况(严格来讲是4个CPU都被占满了达到了400%)

通过剖析当前进程发现,有7个线程全部堵塞在HashMap.getEntry这个方法上面,部分thread运行信息如下:

18326:

===== FULL THREAD DUMP ===============
Tue Jan  8 09:40:49 2013
Oracle JRockit(R) R28.1.5-20-146757-1.6.0_29-20111004-1750-linux-x86_64

"Main Thread" id=1 idx=0x4 tid=18419 prio=5 alive, sleeping, native_waiting
    at java/lang/Thread.sleep(J)V(Native Method)
    at org/apache/hadoop/hive/ql/Driver.pollTasks(Driver.java:1364)
    at org/apache/hadoop/hive/ql/Driver.execute(Driver.java:1122)
    at org/apache/hadoop/hive/ql/Driver.run(Driver.java:951)
    at org/apache/hadoop/hive/cli/CliDriver.processLocalCmd(CliDriver.java:258)
    at org/apache/hadoop/hive/cli/CliDriver.processCmd(CliDriver.java:215)
    at org/apache/hadoop/hive/cli/CliDriver.processLine(CliDriver.java:406)
    at org/apache/hadoop/hive/cli/CliDriver.processLine(CliDriver.java:341)
    at org/apache/hadoop/hive/cli/CliDriver.processReader(CliDriver.java:439)
    at org/apache/hadoop/hive/cli/CliDriver.processFile(CliDriver.java:449)
    at org/apache/hadoop/hive/cli/CliDriver.run(CliDriver.java:647)
    at org/apache/hadoop/hive/cli/CliDriver.main(CliDriver.java:557)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
    at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
    at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java/lang/reflect/Method.invoke(Method.java:597)
    at org/apache/hadoop/util/RunJar.main(RunJar.java:156)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"(Signal Handler)" id=2 idx=0x8 tid=18420 prio=5 alive, native_blocked, daemon

"(OC Main Thread)" id=3 idx=0xc tid=18421 prio=5 alive, native_waiting, daemon

"(GC Worker Thread 1)" id=? idx=0x10 tid=18422 prio=5 alive, daemon

"(GC Worker Thread 2)" id=? idx=0x14 tid=18423 prio=5 alive, daemon

"(GC Worker Thread 3)" id=? idx=0x18 tid=18424 prio=5 alive, daemon

"(GC Worker Thread 4)" id=? idx=0x1c tid=18425 prio=5 alive, daemon

"(Code Generation Thread 1)" id=4 idx=0x20 tid=18426 prio=5 alive, native_waiting, daemon

"(Code Optimization Thread 1)" id=5 idx=0x24 tid=18427 prio=5 alive, native_waiting, daemon

"(VM Periodic Task)" id=6 idx=0x28 tid=18428 prio=10 alive, native_waiting, daemon

"Finalizer" id=7 idx=0x2c tid=18429 prio=8 alive, native_waiting, daemon
    at jrockit/memory/Finalizer.waitForFinalizees(J[Ljava/lang/Object;)I(Native Method)
    at jrockit/memory/Finalizer.access$700(Finalizer.java:12)
    at jrockit/memory/Finalizer$4.run(Finalizer.java:189)
    at java/lang/Thread.run(Thread.java:662)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"Reference Handler" id=8 idx=0x30 tid=18430 prio=10 alive, native_waiting, daemon
    at java/lang/ref/Reference.waitForActivatedQueue(J)Ljava/lang/ref/Reference;(Native Method)
    at java/lang/ref/Reference.access$100(Reference.java:11)
    at java/lang/ref/Reference$ReferenceHandler.run(Reference.java:82)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"(Sensor Event Thread)" id=9 idx=0x34 tid=18431 prio=5 alive, native_blocked, daemon

"VM JFR Buffer Thread" id=10 idx=0x38 tid=18432 prio=5 alive, in native, daemon

"LeaseChecker" id=21 idx=0x58 tid=18443 prio=5 alive, sleeping, native_waiting, daemon
    at java/lang/Thread.sleep(J)V(Native Method)
    at org/apache/hadoop/hdfs/DFSClient$LeaseChecker.run(DFSClient.java:1379)
    at java/lang/Thread.run(Thread.java:662)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"Timer-19" id=416 idx=0x2a8 tid=19295 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0xe07c5070[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0xe07c5070[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"Thread-434" id=456 idx=0x30c tid=19808 prio=5 alive, native_blocked
    at java/util/HashMap.getEntry(HashMap.java:426)
    at java/util/HashMap.containsKey(HashMap.java:415)
    at java/util/HashSet.contains(HashSet.java:184)
    at org/apache/hadoop/hive/ql/QueryPlan.updateCountersInQueryPlan(QueryPlan.java:310)
    at org/apache/hadoop/hive/ql/QueryPlan.getQueryPlan(QueryPlan.java:426)[inlined]
    at org/apache/hadoop/hive/ql/QueryPlan.toString(QueryPlan.java:598)[inlined]
    at org/apache/hadoop/hive/ql/history/HiveHistory.logPlanProgress(HiveHistory.java:497)[optimized]
    at org/apache/hadoop/hive/ql/exec/Task.executeTask(Task.java:132)
    at org/apache/hadoop/hive/ql/exec/TaskRunner.runSequential(TaskRunner.java:57)
    at org/apache/hadoop/hive/ql/exec/TaskRunner.run(TaskRunner.java:47)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

 

注意红色字体部分。

当前top监控使用状态,如下图:

 

目前此问题是间歇性的,需要对各种指标进行分析。

2013-01-16问题再次出现,检查当前GC运行状态,如下图:

ganglia监控页面,如下图:

今天查问题,我自己有个缺陷,就是对top命令不是很熟悉,没有去查看到底是那些thread导致CPU占有率高。后面会对top命令再复习一次,并对这次的查问题的过程做一个总结,为以后提高检查问题的效率节约时间(现在只能等下一次了)。

今天再次出现,通过top -H -p <pid>的命令截图如下:

 

部分栈运行截图如下:

评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值