异常
我的代码如下:
object Test{
def main(args:Array[String]):Unit={
val sparkConf = new sparkConf().setAppName("test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
val df_tb_test1:DataFrame = Prop.readTab("tb_test1",spark)
df_tb_test.foreach(cols => {
val col1 = cols.getString(0)
Prop.readTab("tb_test2",spark) //在这个foreach内部再次使用sparkSession对象
})
}
}
spark算子中使用spark对象的时候,会报出如下的错误,空指针异常:
java.lang.NullPointerException
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:135)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
at com.mycase.test.TestSpark$.queryPersonByAge(TestSpark.scala:39)
at com.mycase.test.TestSpark$$anonfun$1.apply(TestSpark.scala:28)
at com.mycase.test.TestSpark$$anonfun$1.apply(TestSpark.scala:27)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/12/29 21:44:32 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.lang.NullPointerException
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:135)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
at com.mycase.test.TestSpark$.queryPersonByAge(TestSpark.scala:39)
at com.mycase.test.TestSpark$$anonfun$1.apply(TestSpark.scala:28)
at com.mycase.test.TestSpark$$anonfun$1.apply(TestSpark.scala:27)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/12/29 21:44:32 ERROR Executor: Exception in task 7.0 in stage 0.0 (TID 7)
异常原因
我们的spark算子中要使用到sparkSession,首先,我们要弄清楚spark算子和sparkSession分别存在的位置是在哪里,如下图:
很明显sparkSession是存在于driver端,而我们的算子是存在于每一个worker端的excutor,所以在程序执行的过程中,要是算子中想要调用sparkSession的话是调不到的,虽然有对象,但是都是null,会报出空指针异常的错误。
解决方法
在这里有个人可以提供两个方法进行处理
- 将要foreach的数据先collect,然后生成一个数组,遍历数组进行处理
object Test{
def main(args:Array[String]):Unit={
val sparkConf = new sparkConf().setAppName("test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
val arr_tb_test1:Array[(String,String)] = Prop.readTab("tb_test1",spark).rdd.map(row => (row(0).toString,row(1).toString)).collect
arr_tb_test1.foreach(cols => {
val col1 = cols._1
Prop.readTab("tb_test2",spark)
})
}
}
- 将spark和sc单独写成一个单例对象,通过单例对象获得spark和sc
class Demo() {
val sparkConf = new sparkConf().setAppName("test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
}
object Demo {
private val demo = new Demo
def getContext():Demo={
demo
}
}
object Test{
def main(args:Array[String]):Unit={
val df_tb_test1:DataFrame = Prop.readTab("tb_test1",Demo.getContext.spark)
df_tb_test1.foreach(cols => {
val col1 = cols.getString(0)
Prop.readTab("tb_test2",Demo.getContext.spark) //在这个foreach内部使用单例对象调sparkSess
})
}
}