【spark】spark异常：execute, tree:XXXX

最新推荐文章于 2023-07-04 17:20:32 发布

lsr40

最新推荐文章于 2023-07-04 17:20:32 发布

阅读量3.8k

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/lsr40/article/details/89015957

版权

spark 专栏收录该内容

25 篇文章 4 订阅

订阅专栏

本人菜鸡一只，在写代码的时候，经常会遇到这样或者那样的报错，而且脑子还不好，容易忘事，所以得用烂笔头记下来，希望以后能够越来越不蠢！

最近在跑一段spark的时候，报了异常，大概上就是这样：

execute, tree:
Exchange hashpartitioning(字段1#2374, 字段2#2378, 字段3#2373, 5000)
+- *HashAggregate(keys=[字段1#2374, 字段2#2378, 字段3#2373], functions=[partial_count(1)], output=[字段1#2374, 字段2#2378, 字段3#2373, count#2407L])
   +- *Project [字段1#2374, 字段2#2378, CASE WHEN (字段4#2377 IN (XXXX).....
............
............
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
	at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:115)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:252)
	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryRelation.scala:91)
	at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryRelation.scala:86)
	at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryRelation.scala:42)
	at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:100)
	at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68)
	at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92)
	at org.apache.spark.sql.Dataset.persist(Dataset.scala:2518)
        ........
        ........
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

其实我一开始，一直把重点关注在execute, tree这段上，但是研究了好久都找不到解决问题的所在，不过后来发现问题了，其实重点应该在Caused by: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

原因：

后来发现了，其实原因是在开始这段计算的时候，并没有一个可用的SparkContext来构建执行计划树，所以就报错了。

那为什么没有一个可用的SparkContext呢？

因为我前面的代码就有报错，然后我在try-catch的时候调用了sparksession.close()

如下源码解释了close方法，其实就是把sparkContext给stop了

  /**
   * Stop the underlying `SparkContext`.
   *
   * @since 2.0.0
   */
  def stop(): Unit = {
    sparkContext.stop()
  }

  /**
   * Synonym for `stop()`.
   *
   * @since 2.1.0
   */
  override def close(): Unit = stop()

所以我跑去检查了前面的spark的错误，就解决了这个问题

题外话：

通过一段spark命令开启的一个spark任务中，如果想要创建多个sparksession，并不是多次调用如下代码就可以解决的

 spark = SparkSession
    .builder()
    .master("local[*]")
    .appName(appName)
    .enableHiveSupport()
    .getOrCreate();

因为我们看getOrCreate()方法中，通过defaultSession.set(session)将创建好的sparksession设置到原子类AtomicReference中！

 /**
     * Gets an existing [[SparkSession]] or, if there is no existing one, creates a new
     * one based on the options set in this builder.
     *
     * This method first checks whether there is a valid thread-local SparkSession,
     * and if yes, return that one. It then checks whether there is a valid global
     * default SparkSession, and if yes, return that one. If no valid global default
     * SparkSession exists, the method creates a new SparkSession and assigns the
     * newly created SparkSession as the global default.
     *
     * In case an existing SparkSession is returned, the config options specified in
     * this builder will be applied to the existing SparkSession.
     *
     * @since 2.0.0
     */
    def getOrCreate(): SparkSession = synchronized {
      // Get the session from current thread's active session.
      var session = activeThreadSession.get()
      if ((session ne null) && !session.sparkContext.isStopped) {
        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
        if (options.nonEmpty) {
          logWarning("Using an existing SparkSession; some configuration may not take effect.")
        }
        return session
      }

      // Global synchronization so we will only set the default session once.
      SparkSession.synchronized {
        // If the current thread does not have an active session, get it from the global session.
        session = defaultSession.get()
        if ((session ne null) && !session.sparkContext.isStopped) {
          options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
          if (options.nonEmpty) {
            logWarning("Using an existing SparkSession; some configuration may not take effect.")
          }
          return session
        }

        // No active nor global default session. Create a new one.
        val sparkContext = userSuppliedContext.getOrElse {
          // set app name if not given
          val randomAppName = java.util.UUID.randomUUID().toString
          val sparkConf = new SparkConf()
          options.foreach { case (k, v) => sparkConf.set(k, v) }
          if (!sparkConf.contains("spark.app.name")) {
            sparkConf.setAppName(randomAppName)
          }
          val sc = SparkContext.getOrCreate(sparkConf)
          // maybe this is an existing SparkContext, update its SparkConf which maybe used
          // by SparkSession
          options.foreach { case (k, v) => sc.conf.set(k, v) }
          if (!sc.conf.contains("spark.app.name")) {
            sc.conf.setAppName(randomAppName)
          }
          sc
        }

        // Initialize extensions if the user has defined a configurator class.
        val extensionConfOption = sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
        if (extensionConfOption.isDefined) {
          val extensionConfClassName = extensionConfOption.get
          try {
            val extensionConfClass = Utils.classForName(extensionConfClassName)
            val extensionConf = extensionConfClass.newInstance()
              .asInstanceOf[SparkSessionExtensions => Unit]
            extensionConf(extensions)
          } catch {
            // Ignore the error if we cannot find the class or when the class has the wrong type.
            case e @ (_: ClassCastException |
                      _: ClassNotFoundException |
                      _: NoClassDefFoundError) =>
              logWarning(s"Cannot use $extensionConfClassName to configure session extensions.", e)
          }
        }

        session = new SparkSession(sparkContext, None, None, extensions)
        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
        defaultSession.set(session)

        // Register a successfully instantiated context to the singleton. This should be at the
        // end of the class definition so that the singleton is updated only if there is no
        // exception in the construction of the instance.
        sparkContext.addSparkListener(new SparkListener {
          override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = {
            defaultSession.set(null)
            sqlListener.set(null)
          }
        })
      }

      return session
    }
  }

这样一来，每次我们调用getOrCreate()都会返回一个可用的sparksession，避免了重复创建。

如果想创建多个（相互不干扰的）sparksession怎么办？

注意：什么叫相互不干扰，比如我在第一个session中通过createOrReplaceTempView创建了一个临时表，这时候，在第二个session中应该是看不到这张表的！！！

因此我找到了SparkSession.clearDefaultSession();

  /**
   * Clears the default SparkSession that is returned by the builder.
   *
   * @since 2.0.0
   */
  def clearDefaultSession(): Unit = {
    defaultSession.set(null)
  }

所以可以在合适的地方去调用这个方法，之后再调用getOrCreate()这个方法就会创建一个新的sparksession了。

当然，如果不是必要，没事不要创建多个sparksession，毕竟这是一个耗时耗资源的操作。

参考博文：https://blog.csdn.net/cjuexuan/article/details/53207023（作者：cjuexuan）

水平不行，菜鸡一个，茫茫沧海，穹天无路

lsr40

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【spark】spark异常：execute, tree:XXXX

本人菜鸡一只，在写代码的时候，经常会遇到这样或者那样的报错，而且脑子还不好，容易忘事，所以得用烂笔头记下来，希望以后能够越来越不蠢！最近在跑一段spark的时候，报了异常，大概上就是这样：execute, tree:Exchange hashpartitioning(字段1#2374, 字段2#2378, 字段3#2373, 5000)+- *HashAggregate(keys=[...
复制链接

扫一扫