hive on tez,在执行任务的时候报错,这种情况原因是container资源被抢占或者是资源不足。而task最大的失败次数默认是4,am自己失败的最大重试次数默认是2次。
报错日志:
Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:13 Vertex vertex_1588914175897_18178_1_00 [Map 7] killed/failed due to:OTHER_VERTEX_FAILURE, counters=Counters: 35, File System Counters, FILE_BYTES_READ=47800, FILE_BYTES_WRITTEN=1107572220, HDFS_BYTES_READ=745850999, HDFS_READ_OPS=60, HDFS_OP_OPEN=60, org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=16203009, GC_TIME_MILLIS=3322, CPU_MILLISECONDS=239810, PHYSICAL_MEMORY_BYTES=17135828992, VIRTUAL_MEMORY_BYTES=32139993088, COMMITTED_HEAP_BYTES=17135828992, INPUT_RECORDS_PROCESSED=16203009, INPUT_SPLIT_LENGTH_BYTES=4733999156, OUTPUT_RECORDS=16203009, OUTPUT_BYTES=2399220587, OUTPUT_BYTES_WITH_OVERHEAD=2442634582, OUTPUT_BYTES_PHYSICAL=1107524420, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, SHUFFLE_CHUNK_COUNT=5, HIVE, DESERIALIZE_ERRORS=0, RECORDS_IN_Map_7=16203009, RECORDS_OUT_INTERMEDIATE_Map_7=16203009, TaskCounter_Map_7_INPUT_company_baseinfo_complex, INPUT_RECORDS_PROCESSED=16203009, INPUT_SPLIT_LENGTH_BYTES=4733999156, TaskCounter_Map_7_OUTPUT_Reducer_8, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=2399220587, OUTPUT_BYTES_PHYSICAL=1107524420, OUTPUT_BYTES_WITH_OVERHEAD=2442634582, OUTPUT_RECORDS=16203009, SHUFFLE_CHUNK_COUNT=5, SPILLED_RECORDS=16203009, vertexStats=firstTaskStartTime=1589836372219, firstTasksToStart=[ task_1588914175897_18178_1_00_000012,task_1588914175897_18178_1_00_000013,task_1588914175897_18178_1_00_000010,task_1588914175897_18178_1_00_000011 ], lastTaskFinishTime=1589836672777, lastTasksToFinish=[ task_1588914175897_18178_1_00_000008,task_1588914175897_18178_1_00_000009,task_1588914175897_18178_1_00_000006,task_1588914175897_18178_1_00_000007 ], minTaskDuration=38304, maxTaskDuration=61170, avgTaskDuration=49326.8, numSuccessfulTasks=5, shortestDurationTasks=[ task_1588914175897_18178_1_00_000001 ], longestDurationTasks=[ task_1588914175897_18178_1_00_000000 ], vertexTaskStats={numFailedTaskAttempts=0, numKilledTaskAttempts=0, numCompletedTasks=18, numSucceededTasks=5, numKilledTasks=13, numFailedTasks=0} 2020-05-19 05:17:52,787 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1588914175897_18178_1_00 [Map 7] transitioned from TERMINATING to KILLED due to event V_TASK_COMPLETED 2020-05-19 05:17:52,788 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1588914175897_18178_1][Event:CONTAINER_STOPPED]: containerId=container_e57_1588914175897_18178_02_000007, stoppedTime=1589836672788, exitStatus=-1000 2020-05-19 05:17:52,789 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: Vertex vertex_1588914175897_18178_1_04 [Reducer 2] completed., numCompletedVertices=7, numSuccessfulVertices=2, numFailedVertices=1, numKilledVertices=4, numVertices=8 2020-05-19 05:17:52,789 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: Checking vertices for DAG completion, numCompletedVertices=7, numSuccessfulVertices=2, numFailedVertices=1, numKilledVertices=4, numVertices=8, commitInProgress=0, terminationCause=VERTEX_FAILURE 2020-05-19 05:17:52,789 [INFO] [Dispatcher thread {Central}] |node.AMNodeImpl|: Attempt failed on node: storm-node-20:45454 TA: attempt_1588914175897_18178_1_00_000002_0 failed: false container: container_e57_1588914175897_18178_02_000007 numFailedTAs: 0 2020-05-19 05:17:52,789 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1588914175897_18178_1][Event:CONTAINER_STOPPED]: containerId=container_e57_1588914175897_18178_02_000004, stoppedTime=1589836672789, exitStatus=-1000 2020-05-19 05:17:52,789 [INFO] [Dispatcher thread {Central}] |node.AMNodeImpl|: Attempt failed on node: storm-node-15:45454 TA: attempt_1588914175897_18178_1_00_000006_0 failed: false container: container_e57_1588914175897_18178_0