1、导入es-spark依赖
<!-- elasticsearch-spark-20 -->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>7.7.0</version>
</dependency>
<!-- elasticsearch -->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
val spark: SparkSession = SparkSession.builder()
.master("local[*]")
.appName(this.getClass.getSimpleName.filter(!_.equals('$')))
.config("spark.debug.maxToStringFields", 5000)
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("es.nodes", "1xx.xxx.xxx.225,1xx.xxx.xxx.226,1xx.xxx.xxx.227")
.config("es.port", "9200")
.config("es.index.auto.create", "false")
//只通过client节点进行读取操作,因此主节点负载会特别高,性能很差。
// .config("es.nodes.wan.only","true")
.config("es.mapping.date.rich", "false") //对date不进行转换
.getOrCreate()
val map = Map("es.mapping.id" -> "id")
EsSparkSQL.saveToEs(dataFrame, "cell_link_highway_wea", map)
spark.close()
2、程序运行报错

3、解决办法:
导入以下依赖即可:
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>

本文介绍了在Spark项目中导入Elasticsearch-spark依赖以实现数据交互,并展示了配置SparkSession的示例。在运行时遇到程序报错,问题在于缺少commons-httpclient依赖。解决方法是添加相应版本的commons-httpclient依赖到项目中。
1435

被折叠的 条评论
为什么被折叠?



