Is Job Running ?
1. If you have Spark Applications Running, then you should be using SPARK UI. This UI is usually hosted on Spark Driver
– In YARN cluster mode, the Driver is run on YARN Application Master run on random Core node )
– IN YARN Client Mode, the Driver is run on Master node itself.
To access Spark UI, You should be going to YARN ResourceManager UI First. Then navigate to corresponding Spark Application and use “Application Master” link to Access Spark UI. If you observe the link, its taking you you to the application master’s web UI at port 20888. This is basically a proxy running on master listening on 20888 which makes available the Spark UI(which runs on either Core node or Master node)
2. You can also access Spark UI by going directly to Driver Hostname and Portname where its hosted.
For example, when I run spark-submit in cluster mode, it spinned up application_1569345960040_0007. In my driver logs I see below messages
19/09/24 22:29:15 INFO Utils: Successfully started service ‘SparkUI’ on port 35395.
19/09/24 22:29:15 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://ip-10-0-0-69.myermdomain.com:35395
Where ip-10-0-0-69.myermdomain.com is one of my core node.
So I can go to
http://ip-10-0-0-69.myermdomain.com:35395
This automatically routes me to Master node proxy server listening on port 20888
http://ip-10-0-0-113.ec2.internal:20888/proxy/application_1569345960040_0007/
Please note that, these links are temporary and will only show the UI while the Spark Application is running.
Is Job Completed ?
But if you want to see UI even when Spark job is completed, you should use Spark HistoryServer UI directly at http://master-public-dns-name:18080/.
Spark History Server can also be used for Running Jobs using “Show Incomplete Applications” Button. Spark History Server does this by using Spark Event logs which is enabled on EMR by default.
Differences between Spark UI and Spark History UI
But looks like Spark History Server has some differences when compared to “Spark UI” (For Running Apps of course ). Some of em’ that I observed are :
– Spark UI has “Kill” Button so your can kill some Spark Stages while Spark History Server doesn’t.
– SPark UI has “SQL” tab which shows more information about spark-sql jobs while Spark History Server doesn’t.
– Spark UI can pull up live Thread Dumps for Executors while Spark History Server doesn’t.
– Spark UI can give most update to date info(like “Total Uptime”) on Tasks while there can be a bit lag in Spark History Server UI.