structed streaming 整合kafka idea本地测试时遇到的问题
代码如下
def main(args: Array[String]): Unit = {
val sparksession = SparkSession
.builder()
.master("local[*]")
.appName("demoPro")
//.config("spark.debug.maxToStringFields", "200")
.getOrCreate()
import sparksession.implicits._
val lines: DataFrame = sparksession
.readStream.format("kafka")
.option("kafka.bootstrap.servers", "10.*.*.1:9092,10.*.*.2:9092,10.*.*.3:9092")
.option("startingOffsets", "earliest")
.option("subscribe", "zyftest")
.load()
val query = lines
.selectExpr("CAST(topic AS STRING) as topic","CAST(offset AS STRING) as offset","CAST(value AS STRING) as value")
.filter($"value".contains("\"op\":\"ins\"") || $"value".contains("\"op\":\"upd\"") || $"value".contains("\"op\":\"del\""))
.as[(String,String,String)]
.writeStream
.outputMode("append")
.format("console")
.start()
query.awaitTermination()
}
简单的从kafka拉数的代码,一直报错
spark streaming NoClassDefFoundError:
org/apache/spark/sql/sources/v2/reader/SupportsScanUnsafeRow
从网上查去掉pom.xml里的
<//scope>provided<//scope>标签也不对
最后还是修改pom.xml修改正确,这里附上我的pom。xml缺一不可
<dependency>
<groupId>org.apache.spak</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.0</version>
</dependency>
```xml
解决灵感来自于
> https://www.it1352.com/1935721.html