Spark 提供了restful api的方式用于spark任务的监控api截图如下:api访问方式:
http://history-ip:18088/api/v1/
官方文档:http://spark.apache.org/docs/latest/monitoring.html
Spark 分为不同的stage执行所以spark没有提供方式直接查看任务执行成功和失败,可以通过/applications 获取application_id 和attempt_id 然后根据 /applications/app_id/attempt_id/jobs获取每个stage的执行情况,进而获取整个任务的状态,同样也可以监控任务的执行时长等情况。下面主要说明/applications api和/application/[app_id]/jobs api其中[app_id] 为1)若有attempt_id 则为application_id/attempt_id,2)不存在attempt_id 为application_id.
- /applications
传递参数见上图,需要注意对于日期为GMT为0的日期,而不是所在地的日期,根据时间访问时,需要将时区设置为0时区,python中可以通过datetime.datetime.utcnow()获取
返回值:
[ {
"id" : "application_1597231412870_3608",
"name" : "DwMicbiTrafficLogStatOrigMain: 20200813",
"attempts" : [ {
"attemptId" : "1",
"startTime" : "2020-08-13T03:29:37.826GMT",
"endTime" : "2020-08-13T03:32:04.860GMT",
"lastUpdated" : "2020-08-13T03:32:04.948GMT",
"duration" : 147034,
"sparkUser" : "root",
"completed" : true,
"appSparkVersion" : "2.4.0-cdh6.2.0",
"endTimeEpoch" : 1597289524860,
"lastUpdatedEpoch" : 1597289524948,
"startTimeEpoch" : 1597289377826
} ]
}]
里面封装了任务的大致的信息,其中主要的是applicationId和attemptId,注意attemptId可能有多个。这两个参数拼接起来可以构建为[app-id]
2. /applications/[app-id]/jobs
数据参数见上图,
返回:
[ {
"jobId" : 1,
"name" : "runJob at SparkHadoopWriter.scala:78",
"submissionTime" : "2020-08-10T22:19:58.391GMT",
"completionTime" : "2020-08-10T22:24:00.128GMT",
"stageIds" : [ 1, 2 ],
"status" : "SUCCEEDED",
"numTasks" : 10,
"numActiveTasks" : 0,
"numCompletedTasks" : 10,
"numSkippedTasks" : 0,
"numFailedTasks" : 4,
"numKilledTasks" : 0,
"numCompletedIndices" : 10,
"numActiveStages" : 0,
"numCompletedStages" : 2,
"numSkippedStages" : 0,
"numFailedStages" : 0,
"killedTasksSummary" : { }
}, {
"jobId" : 0,
"name" : "runJob at SparkHadoopWriter.scala:78",
"submissionTime" : "2020-08-10T22:19:37.736GMT",
"completionTime" : "2020-08-10T22:19:58.281GMT",
"stageIds" : [ 0 ],
"status" : "SUCCEEDED",
"numTasks" : 5,
"numActiveTasks" : 0,
"numCompletedTasks" : 5,
"numSkippedTasks" : 0,
"numFailedTasks" : 0,
"numKilledTasks" : 0,
"numCompletedIndices" : 5,
"numActiveStages" : 0,
"numCompletedStages" : 1,
"numSkippedStages" : 0,
"numFailedStages" : 0,
"killedTasksSummary" : { }
} ]