java spark es,使用elasticsearch-spark连接器从Spark读取ES：返回所有字段

weixin_39865102

于 2021-02-24 09:55:04 发布

阅读量380

点赞数

文章标签： java spark es

本文介绍了如何使用elasticsearch-spark连接器从Spark读取Elasticsearch数据，遇到的问题是即使指定了查询中只获取部分字段，结果仍然包含所有字段。解决方案是利用DataFrame而非RDD，并启用pushdown predicate，通过`select()`来指定字段，用`limit()`替换`size`参数以限制返回记录数。

摘要由CSDN通过智能技术生成

I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:

] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar

In the scala shell:

scala> import org.elasticsearch.spark._

scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")

It works well, the result contains the good records as specified in myquery. The only thing is that I get all the fields, even if I specify a subset of these fields in the query. Example:

myquery = """{"query":..., "fields":["a","b"], "size":10}"""

returns all the fields, not only a and b (BTW, I noticed that size parameter is not taken in account neither : result contains more than 10 records). Maybe it's important

最低0.47元/天解锁文章

weixin_39865102

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java spark es,使用elasticsearch-spark连接器从Spark读取ES：返回所有字段

I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1...
复制链接

扫一扫