六(1)、spark遇到的问题

本文列举了多个在使用Spark过程中遇到的问题,包括Spark Shell意外退出导致的4040端口冲突、RDD数据类型处理、SparkSession创建时的配置问题、不同版本Kafka的依赖问题、Spark在不同模式下提交的注意事项、Spark连接HBase和Redis的错误、SparkSQL与MySQL连接错误、Spark Streaming长时间运行的故障排查等,并提供了相应的解决方法。
摘要由CSDN通过智能技术生成

1、意外退出spark-shell,而不是quit,然后再输入spark-shell命令的时候,报错:

19/04/11 13:42:32 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.

jps查看,SparkSubmit在,查看4040端口也是被SparkSubmit占用中。
解决方法:https://blog.csdn.net/wawa8899/article/details/81016029

spark-shell启动时会启动一个Spark 的Web UI。由于刚刚启动spark-shell的时候并没有指定appName,所以Web UI右上角显示Spark shell application UI(源码$SPARK_HOME/bin/spark-shell里面定义)。如果指定了AppName,则这里显示AppName。
Spark Web UI默认使用端口号4040,如果4040被占用,它会自动+1,即使用4041;若4041也被占用,依此类推

2、
for (elem <- rdd.collect()) {println(elem.getClass.getSimpleName);print(elem);println("---------")}
查询出来elem数据类型为String(其中rdd为MapPartitionsRDD)

scala> val list01=List(1,2,3,4)
list01: List[Int] = List(1, 2, 3, 4)
scala> val rdd1 = sc.parallelize(list01)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at parallelize at <console>:26
scala> rdd1.collect()
res2: Array[Int] = Array(1, 2, 3, 4)

如果不是放的list或者array,就会报错:

scala> sc.parallelize(set01).collect()
<console>:27: error: type mismatch;
found : scala.collection.immutable.Set[Int]
required: Seq[?]
sc.parallelize(set01).collect()

3、在创建sparksession实例的时候:
val spark: SparkSession = SparkSession.builder().appName(“node01”).master(“master”).enableHiveSupport().getOrCreate()
报错:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.

去掉enableHiveSupport()之后报错:

Exception in thread "main" org.apache.spark.SparkException: Could not parse Master URL: 'node01'

将master参数改为local之后,报错:

Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
com.jenny.spark.SparkStreamingKafkaReceiver$.main(SparkStreamingKafkaReceiver.scala:17)
com.jenny.spark.SparkStreamingKafkaReceiver.main(SparkStreamingKafkaReceiver.scala)
	at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2472)
	at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2468)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2468)
	at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2557)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:85)
	at com.jenny.spark.SparkStreamingKafkaReceiver$.main(SparkStreamingKafkaReceiver.scala:24)
	at com.jenny.spark.SparkStreamingKafkaReceiver.main(SparkStreamingKafkaReceiver.scala)
19/04/12 15:10:31 INFO spark.SparkContext: Invoking stop() from shutdown hook
19/04/12 15:10:31 INFO server.AbstractConnector: Stopped Spark@6ca0256d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/04/12 15:10:31 INFO ui.SparkUI: Stopped Spark web UI at http://172.18.94.121:4040
19/04/12 15:10:31 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/04/12 15:10:31 INFO memory.MemoryStore: MemoryStore cleared
19/04/12 15:10:31 INFO storage.BlockManager: BlockManager stopped
19/04/12 15:10:31 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
19/04/12 15:10:31 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/04/12 15:10:31 INFO spark.SparkContext: Successfully stopped SparkContext
19/04/12 15:10:31 INFO util.ShutdownHookManager: Shutdown hook called
19/04/12 15:10:31 INFO util.ShutdownHookManager: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-fcc174bd-0722-4b43-9852-522997a84140

Process finished with exit code 1

3、
因为写spark的时候用的0.8版本kafka的依赖,但是cdh集群是1.0kafka,所以在集群上运行jar包的时候报错:

spark Exception in thread "main" java.lang.NoSuchMethodError: kafka.api.TopicMetadata.errorCode

后面改成1.0版本kafka的依赖就可以了
使用0.8版本kafka依赖和使用1.0版本kafka的区别:
pom依赖:

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.0.0</version>
        </dependency>

        <!--<dependency>-->
            <!--<groupId>org.apache.spark</groupId>-->
            <!--<artifactId>spark-streaming-kafka-0-8_2.11</artifactId>-->
            <!--<version>${spark.version}</version>-->
        <!--</dependency>-->

package com.jenny.spark
import cn.just.spark.domain.Test02
import kafka.serializer.StringDecoder
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, ConsumerStrategy, LocationStrategies, LocationStrategy}
//import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext}
object SparkStreamingKafkaDirect {
def main(args: Array[String]): Unit = {
// 用的下面的属性
val spark: SparkSession = SparkSession.builder().appName(“SparkStreamingKafkaDirect”).master(“local[2]”).getOrCreate()
val sc: SparkContext = spark.sparkContext
sc.setLogLevel(“WARN”)
//设置schema结构
val schema = StructType(
Seq(
StructField(“col1”,StringType,true)
,StructField(“col2”,StringType,true)
,StructField(“col3”,StringType,true)
,StructField(“col4”,StringType,true)
,StructField(“update_time”,StringType,true)
)
)
//1、创建streamingcontext
val ssc: StreamingContext = new StreamingContext(sc,Seconds(10))
//2、准备kafka参数
// val kafkaparams = Map(“metadata.broker.list”->“node01:9092”,“group.id”->“spark_direct”) //0.8版本kafka
//1.0版本kafka
val kafkaparams = Map[String, Object](
ConsumerConfig.BOOTSTRAP_S

  • 5
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值