hive mysql慢查询_Hive查询OOM分析

Hive的某个查询突然报OutOfMemoryError:

hive> select upstream_addr from lb_log where

> domain_name='xxxxx' and host='xxxxxx' and dt=121117 and hour=20 limit 10;

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Arrays.java:2882)

at java.lang.StringValue.from(StringValue.java:24)

at java.lang.String.(String.java:178)

at com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:286)

at com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:262)

at com.mysql.jdbc.ResultSet.extractStringFromNativeColumn(ResultSet.java:837)

at com.mysql.jdbc.ResultSet.getNativeConvertToString(ResultSet.java:3296)

at com.mysql.jdbc.ResultSet.getNativeString(ResultSet.java:3780)

at com.mysql.jdbc.ResultSet.getStringInternal(ResultSet.java:4979)

at com.mysql.jdbc.ResultSet.getString(ResultSet.java:4810)

at org.apache.commons.dbcp.DelegatingResultSet.getString(DelegatingResultSet.java:213)

at org.apache.commons.dbcp.DelegatingResultSet.getString(DelegatingResultSet.java:213)

at org.datanucleus.store.rdbms.mapping.CharRDBMSMapping.getObject(CharRDBMSMapping.java:460)

at org.datanucleus.store.mapped.mapping.SingleFieldMapping.getObject(SingleFieldMapping.java:216)

at org.datanucleus.store.rdbms.query.ResultClassROF.processScalarExpression(ResultClassROF.java:583)

at org.datanucleus.store.rdbms.query.ResultClassROF.getObject(ResultClassROF.java:361)

at org.datanucleus.store.rdbms.query.legacy.LegacyForwardQueryResult.nextResultSetElement(LegacyForwardQueryResult.java:137)

at org.datanucleus.store.rdbms.query.legacy.LegacyForwardQueryResult$QueryResultIterator.next(LegacyForwardQueryResult.java:305)

at org.apache.hadoop.hive.metastore.ObjectStore.listPartitionNames(ObjectStore.java:1200)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$26.run(HiveMetaStore.java:1569)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$26.run(HiveMetaStore.java:1566)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_names(HiveMetaStore.java:1566)

at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionNames(HiveMetaStoreClient.java:734)

at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionNames(Hive.java:1361)

at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:182)

at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:112)

at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)

at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)

at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)

at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)

at org.apache.hadoop.hive.ql.optimizer.pcr.PartitionConditionRemover.transform(PartitionConditionRemover.java:78)

尝试explain, 部分时候会超时,非常慢:

hive> explain select upstream_addr from lb_log where

> domain_name='xxxxx' and dt='121117' and hour='20'  and host='xxxxx' limit 10;

FAILED: Error in semantic analysis: javax.jdo.JDODataStoreException: Transaction failed to commit

NestedThrowables:

org.datanucleus.transaction.NucleusTransactionException: Transaction failed to flush

hive> explain select upstream_addr from lb_log where

> domain_name='xxxxx' and host='xxxxx' and dt=121117 and hour=20 limit 10;

OK

ABSTRACT SYNTAX TREE:

(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME lb_log))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL upstream_addr))) (TOK_WHERE (and (and (and (= (TOK_TABLE_OR_COL domain_name) 'xxxxx') (= (TOK_TABLE_OR_COL host) 'xxxxx')) (= (TOK_TABLE_OR_COL dt) 121117)) (= (TOK_TABLE_OR_COL hour) 20))) (TOK_LIMIT 10)))

STAGE DEPENDENCIES:

Stage-1 is a root stage

Stage-0 is a root stage

STAGE PLANS:

Stage: Stage-1

Map Reduce

Alias -> Map Operator Tree:

lb_log

TableScan

alias: lb_log

Filter Operator

predicate:

expr: (host = 'xxxxx')

type: boolean

Filter Operator

predicate:

expr: ((((domain_name = 'xxxxx') and (host = 'xxxxx')) and (dt = 121117)) and (hour = 20))

type: boolean

Select Operator

expressions:

expr: upstream_addr

type: string

outputColumnNames: _col0

Limit

File Output Operator

compressed: false

GlobalTableId: 0

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Stage: Stage-0

Fetch Operator

limit: 10

Time taken: 82.76 seconds

这些表都有一个共同的特征,分区特别多, 可以从java报错看到是listpartition的时候出现的超时。

hive metastore使用的是mysql, 我们很自然的去查看了mysql的慢查询:

# Time: 121120 14:47:49

# User@Host: hiveuser[hiveuser] @  [10.208.10.131]

# Query_time: 5.904098  Lock_time: 0.000051 Rows_sent: 660794  Rows_examined: 1982384

use metastore;

SET timestamp=1353394069;

SELECT `THIS`.`PART_NAME` AS NUCORDER0 FROM `PARTITIONS` `THIS` LEFT OUTER JOIN `TBLS` `THIS_TABLE_DATABASE` ON `THIS`.`TBL_ID` = `THIS_TABLE_DATABASE`.`TBL_ID` LEFT OUTER JOIN `DBS` `THIS_TABLE_DATABASE_DATABASE_NAME` ON `THIS_TABLE_DATABASE`.`DB_ID` = `THIS_TABLE_DATABASE_DATABASE_NAME`.`DB_ID` LEFT OUTER JOIN `TBLS` `THIS_TABLE_TABLE_NAME` ON `THIS`.`TBL_ID` = `THIS_TABLE_TABLE_NAME`.`TBL_ID` WHERE `THIS_TABLE_DATABASE_DATABASE_NAME`.`NAME` = 'default' AND `THIS_TABLE_TABLE_NAME`.`TBL_NAME` = 'lb_log' ORDER BY NUCORDER0;

确实,这个查询花了5s, 而且返回了大量的数据(Rows_sent: 660794 ), 这些数据hive看来是全部读取到了内存中进行执行计划的选择。

进一步看一下hive metadata的所有表:

[root@mysql.sock]metastore 16:01:36>select * from tbls where tbl_name='lb_log'\G

*************************** 1. row ***************************

TBL_ID: 17202

CREATE_TIME: 1349850710

DB_ID: 1

LAST_ACCESS_TIME: 0

OWNER: pplive

RETENTION: 0

SD_ID: 1421577

TBL_NAME: lb_log

TBL_TYPE: MANAGED_TABLE

VIEW_EXPANDED_TEXT: NULL

VIEW_ORIGINAL_TEXT: NULL

1 row in set (0.00 sec)

[root@mysql.sock]metastore 16:01:47>select count(*) from partitions where tbl_id=17202;

+----------+

| count(*) |

+----------+

|   654118 |

+----------+

1 row in set (0.38 sec)

[root@mysql.sock]metastore 16:02:12>select * from partitions where tbl_id=17202 limit 1\G

*************************** 1. row ***************************

PART_ID: 1431461

CREATE_TIME: 1349891125

LAST_ACCESS_TIME: 0

PART_NAME: domain_name=xx.com/dt=121011/hour=00/serverip=xx.xx

SD_ID: 1465631

TBL_ID: 17202

1 row in set (0.00 sec)

很容易看到,这个表的分区达到了65w多,主要是我们的表有四个维度,四个维度的分区一个多月就产生了65w的分区。

解决方法:

1. 减少维度,这个我们这里不可以

2. 删除数据,我们添加删除程序,保留20天的数据

3. 自动切换表,比如每个月做一个月表,这样一个表的分区数目可以控制

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值