让spark 2.3, 2.4版本显示执行过程进度条的方法

原文如下

原文地址 https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sparkcontext-ConsoleProgressBar.html

ConsoleProgressBar

ConsoleProgressBar shows the progress of active stages to standard error, i.e. stderr. It uses SparkStatusTracker to poll the status of stages periodically and print out active stages with more than one task. It keeps overwriting itself to hold in one line for at most 3 first concurrent stages at a time.

[Stage 0:====>          (316 + 4) / 1000][Stage 1:>                (0 + 0) / 1000][Stage 2:>                (0 + 0) / 1000]]]

The progress includes the stage id, the number of completed, active, and total tasks.

Tip

ConsoleProgressBar may be useful when you ssh to workers and want to see the progress of active stages.

ConsoleProgressBar is created when SparkContext starts with spark.ui.showConsoleProgressenabled and the logging level of org.apache.spark.SparkContext logger as WARN or higher (i.e. less messages are printed out and so there is a "space" for ConsoleProgressBar).

import org.apache.log4j._
Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN)

To print the progress nicely ConsoleProgressBar uses COLUMNS environment variable to know the width of the terminal. It assumes 80 columns.

The progress bar prints out the status after a stage has ran at least 500 milliseconds every spark.ui.consoleProgress.update.interval milliseconds.

Note

The initial delay of 500 milliseconds before ConsoleProgressBar show the progress is not configurable.

See the progress bar in Spark shell with the following:

$ ./bin/spark-shell --conf spark.ui.showConsoleProgress=true  (1)

scala> sc.setLogLevel("OFF")  (2)

scala> import org.apache.log4j._
import org.apache.log4j._

scala> Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN)  (3)

scala> sc.parallelize(1 to 4, 4).map { n => Thread.sleep(500 + 200 * n); n }.count  (4)
[Stage 2:>                                                          (0 + 4) / 4]
[Stage 2:==============>                                            (1 + 3) / 4]
[Stage 2:=============================>                             (2 + 2) / 4]
[Stage 2:============================================>              (3 + 1) / 4]
  1. Make sure spark.ui.showConsoleProgress is true. It is by default.

  2. Disable (OFF) the root logger (that includes Spark’s logger)

  3. Make sure org.apache.spark.SparkContext logger is at least WARN.

  4. Run a job with 4 tasks with 500ms initial sleep and 200ms sleep chunks to see the progress bar.

简言之:

1、如果是使用idea、eclipse等ide编写代码,需要以下2步:
      1)  SparkSession 、SparkConf 创建之前,加入如下代码

import org.apache.log4j._
Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN)

      2) 创建SparkSession时设置 spark.ui.showConsoleProgress 为 true

2、如果是使用 spark-shell 则需要   如下操作:

$ ./bin/spark-shell --conf spark.ui.showConsoleProgress=true  (1)

scala> sc.setLogLevel("OFF")  (2)

scala> import org.apache.log4j._
import org.apache.log4j._

scala> Logger.getLogger("org.apache.spark.SparkContext").setLevel(Level.WARN)  (3)

scala> sc.parallelize(1 to 4, 4).map { n => Thread.sleep(500 + 200 * n); n }.count  (4)
[Stage 2:>                                                          (0 + 4) / 4]
[Stage 2:==============>                                            (1 + 3) / 4]
[Stage 2:=============================>                             (2 + 2) / 4]
[Stage 2:============================================>              (3 + 1) / 4]
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值