Spark SQL 读写 elasticsearch

1、需要的maven依赖

org.elasticsearch
elasticsearch-hadoop
2.2.0-m1

2、配置
将下载的elasticsearch-hadoop包放置到$SPARK_HOME/lib/下

3数据写入es
vim /home/admin/people.txt,增加如下测试内容:

liu,sun,20
li,si,30
wang,wu,40
li,bai,100
du,fu,101

3、spark读取ES代码示例
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext._
import org.elasticsearch.spark.sql._
import org.apache.spark.rdd.RDD._
import sqlContext.implicits._

//创建sqlContext
val sqlContext = new SQLContext(sc)

//定义Person case class
case class Person(name: String, surname: String, age: Int)

//创建DataFrame
val people = sc.textFile(“file:///home/cluster/data/test/people.txt”).map(_.split(",")).map(p => Person(p(0), p(1), p(2).trim.toInt)).toDF()
people.saveToEs(“spark/people”)

查询插入到elasticsearch数据:

GET /spark/people/_search

{
“took”: 1,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“failed”: 0
},
“hits”: {
“total”: 5,
“max_score”: 1,
“hits”: [
{
“_index”: “spark”,
“_type”: “people”,
“_id”: “AVDrt5MyJpFYJkWM4nwP”,
“_score”: 1,
“_source”: {
“name”: “zhang”,
“surname”: “san”,
“age”: 20
}
},
{
“_index”: “spark”,
“_type”: “people”,
“_id”: “AVDrt5SEJpFYJkWM4nwS”,
“_score”: 1,
“_source”: {
“name”: “li”,
“surname”: “bai”,
“age”: 100
}
},
{
“_index”: “spark”,
“_type”: “people”,
“_id”: “AVDrt5MyJpFYJkWM4nwR”,
“_score”: 1,
“_source”: {
“name”: “wang”,
“surname”: “wu”,
“age”: 40
}
},
{
“_index”: “spark”,
“_type”: “people”,
“_id”: “AVDrt5MyJpFYJkWM4nwQ”,
“_score”: 1,
“_source”: {
“name”: “li”,
“surname”: “si”,
“age”: 30
}
},
{
“_index”: “spark”,
“_type”: “people”,
“_id”: “AVDrt5SEJpFYJkWM4nwT”,
“_score”: 1,
“_source”: {
“name”: “du”,
“surname”: “fu”,
“age”: 101
}
}
]
}
}

5、load 和read的方式读取
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext._

val sqlContext = new SQLContext(sc)
// options for Spark 1.3 need to include the target path/resource
val options13 = Map(“path” -> “spark/people”,
“pushdown” -> “true”,
“es.nodes” -> “localhost”,“es.port” -> “9200”)

// Spark 1.3 style
val spark13DF = sqlContext.load(“org.elasticsearch.spark.sql”, options13)

// options for Spark 1.4 - the path/resource is specified separately
val options = Map(“pushdown” -> “true”, “es.nodes” -> “localhost”, “es.port” -> “9200”)

// Spark 1.4 style
val spark14DF = sqlContext.read.format(“org.elasticsearch.spark.sql”).options(options).load(“spark/people”)

查询name,age:
spark14DF.select(“name”,“age”).collect().foreach(println(_))
[zhang,20]
[li,100]
[wang,40]
[li,30]
[du,101]

注册临时表&查询name
spark14DF.registerTempTable(“people”)
val results = sqlContext.sql(“SELECT name FROM people”)
results.map(t => "Name: " + t(0)).collect().foreach(println)
Name: zhang
Name: li
Name: wang
Name: li
Name: du

6、读取elasticsearch数据并创建临时表myPeople

sqlContext.sql(
"CREATE TEMPORARY TABLE myPeople " +
"USING org.elasticsearch.spark.sql " +
“OPTIONS ( resource ‘spark/people’, nodes ‘localhost:9200’)” )

sqlContext.sql(“select * from myPeople”).collect.foreach(println)

[20,zhang,san]
[100,li,bai]
[40,wang,wu]
[30,li,si]
[101,du,fu]

sqlContext.sql(
"CREATE TEMPORARY TABLE myPeople " +
"USING org.elasticsearch.spark.sql " +
“OPTIONS ( resource ‘spark/people’, nodes ‘localhost:9200’,scroll_size ‘20’)” )
因为使用.会导致语法异常,应该用_风格代替它. 因此,在这个例子中es.scroll.size 变成 scroll_size(由于加载数据时候es可以删除)。注意这只工作在spark1.3/1.4的spark有更严格的解析器

7、esDF方式读取elasticsearch数据:
import org.apache.spark.sql.SQLContext
import org.elasticsearch.spark.sql._

val sqlContext = new SQLContext(sc)
val people = sqlContext.esDF(“spark/people”)
// check the associated schema
println(people.schema.treeString)

root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
|-- surname: string (nullable = true)

// get only the wang
val wangs = sqlContext.esDF(“spark/people”,"?q=wang" )

wangs.show()

±–±—±------+
|age|name|surname|
±–±—±------+
| 40|wang| wu|
±–±—±------+

8、 读取elasticsearch数据并创建临时表myPeople
esDF方式读取elasticsearch数据:

import org.apache.spark.sql.SQLContext
import org.elasticsearch.spark.sql._

val sqlContext = new SQLContext(sc)
val people = sqlContext.esDF(“spark/people”)
// check the associated schema
println(people.schema.treeString)

root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
|-- surname: string (nullable = true)

// get only the wang
val wangs = sqlContext.esDF(“spark/people”,"?q=wang" )

wangs.show()

±–±—±------+
|age|name|surname|
±–±—±------+
| 40|wang| wu|
±–±—±------+

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值