用 Spark 为 Elasticsearch 导入搜索数据

越来越健忘了,得记录下自己的操作才行!

ES和spark版本:

spark-1.6.0-bin-hadoop2.6

Elasticsearch for Apache Hadoop 2.1.2

如果是其他版本,在索引数据写入的时候可能会出错。

首先,启动es后,spark shell导入es-hadoop jar包:

cp elasticsearch-hadoop-2.1.2/dist/elasticsearch-spark* spark-1.6.0-bin-hadoop2.6/lib/
cd spark-1.6.0-bin-hadoop2.6/bin
./spark-shell --jars ../lib/elasticsearch-spark-1.2_2.10-2.1.2.jar

交互如下:

import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val conf = new SparkConf()
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "127.0.0.1")
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

然后查看ES中的数据:

http://127.0.0.1:9200/spark/docs/_search?q=*

结果如下:

{"took":71,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"spark","_type":"docs","_id":"AVfhVqPBv9dlWdV2DcbH","_score":1.0,"_source":{"OTP":"Otopeni","SFO":"San Fran"}},{"_index":"spark","_type":"docs","_id":"AVfhVqPOv9dlWdV2DcbI","_score":1.0,"_source":{"one":1,"two":2,"three":3}}]}}

 

 

参考:

https://www.elastic.co/guide/en/elasticsearch/hadoop/2.1/spark.html#spark-installation

http://spark.apache.org/docs/latest/programming-guide.html

http://chenlinux.com/2014/09/04/spark-to-elasticsearch/

转载于:https://www.cnblogs.com/bonelee/p/5981699.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值