在hive 执行与dyanmodb相关的操作时,遇到了下面错误
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1481976368061_0022_1_00, diagnostics=[Task failed, taskId=task_1481976368061_0022_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:261)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: java.lang.NoSuchFieldError: INSTANCE
at com.amazonaws.http.conn.SdkConnectionKeepAliveStrategy.getKeepAliveDuration(SdkConnectionKeepAliveStrategy.java:48)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:535)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2000)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1970)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1255)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:90)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:87)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:80)
at org.apache.hadoop.dynamodb.DynamoDBClient.describeTable(DynamoDBClient.java:86)
at org.apache.hadoop.hive.dynamodb.DynamoDBSerDe.verifyDynamoDBWriteThroughput(DynamoDBSerDe.java:180)
at org.apache.hadoop.hive.dynamodb.DynamoDBSerDe.initialize(DynamoDBSerDe.java:58)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:364)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:545)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:498)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:368)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:545)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:498)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:368)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:443)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:266)
... 15 more
分析原因是由于Caused by: java.lang.NoSuchFieldError: INSTANCE at com.amazonaws.http.conn.SdkConnectionKeepAliveStrategy.getKeepAliveDuration(SdkConnectionKeepAliveStrategy.java:48) 这个错误导致的,review了下代码,发现没有引入com.anazonaws的jar,说明用的是hive自行加载的aws sdk,因此先替换尝试下,添加如下maven
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-dynamodb</artifactId>
<version>1.10.20</version>
</dependency>
然后重新打包再运行,OK,问题解决了~
PS:要确保项目依赖的的hadoop Jar和hive-jdbc jar都是和集群一致的,这样能更好的排除异常
类似问题:https://community.hortonworks.com/questions/72569/vertex-failed-while-overwrite-the-hive-table.html