spark sql——4. rdd与dataframe相互转化

dataframe转rdd:

dataframe转rdd很简单,直接.rdd即可

scala> val df = spark.table("people_result")

df: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> val rdd1 = df.rdd

rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[9] at rdd at <console>:25

scala> rdd1.collect

res8: Array[org.apache.spark.sql.Row] = Array([3,c,22], [2,b,24])

scala> rdd1.map(x=>(x(1),x(0),x(2))).collect

res7: Array[(Any, Any, Any)] = Array((c,3,22), (b,2,24))

 

 

 

rdd转dataframe:

1.通过反射推断schema toDF

2.通过structtype直接指定schema createDataFrame

 

详细可参考官网:http://spark.apache.org/docs/latest/sql-programming-guide.html

 

hdfs存一张表:/test/persion.txt,内容:

[root@master ~]# hadoop fs -cat /test/person.txt

1,tingting,23,80

2,ningning,25,90

3,ruhua,27,60

4,mimi,33,85

 

从RDD创建DataFrame:

(1)创建RDD

scala> val line = sc.textFile("hdfs://master:9000/test/person.txt").map(_.split(","))

(2)创建DataFrame

加入schema(表头)信息:

通过反射推断schema:

scala> case class Person(id: Int, name: String, age: Int, faceValue: Int)

scala> val personRDD = line.map(x => Person(x(0).toInt, x(1), x(2).toInt, x(3).toInt))

scala> val personDF = personRDD.toDF

scala> personDF.show

 

或者通过structtype直接指定schema:

import org.apache.spark.sql.types._

import org.apache.spark.sql.Row

直接指定schema:true表示是否可为空

val schema = StructType(

List(

StructField("id", IntegerType, true),

StructField("name", StringType, true),

StructField("age", IntegerType, true),

StructField("faceValue", IntegerType, true)

)

)

val rowRDD = line.map(x => Row(x(0).toInt, x(1), x(2).toInt, x(3).toInt))

val personDF = spark.createDataFrame(rowRDD, schema)

 

 

+---+--------+---+---------+

| id| name|age|faceValue|

+---+--------+---+---------+

| 1|tingting| 23| 80|

| 2|ningning| 25| 90|

| 3| ruhua| 27| 60|

| 4| mimi| 33| 85|

+---+--------+---+---------+

其中,case class Person为创建一个Person样例类,相当于构造器

personRDD的格式为:

scala> personRDD.collect

res2: Array[Person] = Array(Person(1,tingting,23,80), Person(2,ningning,25,90), Person(3,ruhua,27,60), Person(4,mimi,33,85))

 

 

详细教程去spark sql官网查看:http://spark.apache.org/docs/latest/sql-programming-guide.html

 

 

 

 

 

IDEA代码:

通过反射推断schema

Spark2.0始,spark使用SparkSession接口代替SQLcontext和HiveContext

SparkSession是spark sql程序的入口

spark-shell命令,会自动创建好SparkContext和SparkSession

 

import org.apache.spark.sql.SparkSession

//样例类,定义表头的各个字段,供创建DataFrame时使用

case class Person(id: Int, name: String, age: Int, faceValue: Int)

object sparkSQL {

def main(args: Array[String]): Unit = {

val spark = SparkSession.builder()

.appName("sql test")

.master("local")

.getOrCreate()

//implicits隐士转换包,toDF、show方法等都需要这个包

import spark.implicits._

//读取

val line = spark.sparkContext.textFile("hdfs://master:9000/test/person.txt")

.map(_.split(","))

//将RDD和case class(样例类)关联

val personRDD = line.map(x => Person(x(0).toInt, x(1), x(2).toInt, x(3).toInt))

//RDD转换为DataFrame

val personDF = personRDD.toDF

 

//查看DataFrame内容

personDF.show

 

//注册

personDF.createOrReplaceTempView("person")

//查询

val df = spark.sql("select * from person order by age desc limit 2")

//输出

df.write.json("hdfs://master:9000/result")

}

}

 

 

需要注意的地方:

1.SparkSession:

spark2.0起,SparkConf、SparkContext、SQLContext都被封装在SparkSession中

val test = SparkSession.builder()

2. import test.implicits._

implicits隐士转换包,隐式转换toDF、show方法等都需要这个包

3. 注册

想要使用spark.sql("sql语句") 就要把dataframe注册临时表

使用createOrReplaceTempView

spark2.0起,registerTempTable被弃用

 

结果:

下载下来查看它

 

 

IDEA代码:

通过structtype直接指定schema

import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.types._

import org.apache.spark.sql.Row

 

object sparkSQL {

def main(args: Array[String]): Unit = {

val spark = SparkSession.builder()

.appName("sql test")

.master("local")

.getOrCreate()

//implicits隐士转换包,toDF、show方法等都需要这个包

import spark.implicits._

//读取

val line = spark.sparkContext.textFile("hdfs://master:9000/test/person.txt")

.map(_.split(","))

 

//将RDD和case class(样例类)关联

//val personRDD = line.map(x => Person(x(0).toInt, x(1), x(2).toInt, x(3).toInt))

 

val schema = StructType(

List(

StructField("id", IntegerType, true),

StructField("name", StringType, true),

StructField("age", IntegerType, true),

StructField("faceValue", IntegerType, true)

)

)

val rowRDD = line.map(x => Row(x(0).toInt, x(1), x(2).toInt, x(3).toInt))

val personDF = spark.createDataFrame(rowRDD, schema)

 

//RDD转换为DataFrame

//val personDF = personRDD.toDF

 

//查看DataFrame内容

//personDF.show

 

//注册

personDF.createOrReplaceTempView("person")

//查询

val df = spark.sql("select * from person order by age desc limit 2")

//输出

df.show

}

}

 

结果:

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值