SparkSQL之RDD转换DataFrame

本文介绍了如何使用SparkSQL读取Json文件,并详细阐述了两种将RDD转换为DataFrame的方法:通过toDF()函数和将原生RDD转换为RDD[Row]与StructType匹配。案例展示了具体的代码实现和验证过程。
摘要由CSDN通过智能技术生成

1 、SparkSQL读取Json文件

先随便造两份Json格式数据。

[hadoop@vm01 data]$ vi stu1.json 
{
   "id":"1","name":"zhangsan","phone":"13721442689","email":"1@qq.com"}
{
   "id":"2","name":"lisi","phone":"13721442687","email":"2@qq.com"}
{
   "id":"3","name":"wangwu","phone":"13721442688","email":"3@qq.com"}
{
   "id":"4","name":"xiaoming","phone":"13721442686","email":"4@qq.com"}
{
   "id":"5","name":"xiaowang","phone":"13721442685","email":"5@qq.com"}
[hadoop@vm01 data]$ vi banji.json 
{
   "id":"1","banji":"601"}
{
   "id":"2","banji":"602"}
{
   "id":"3","banji":"603"}
{
   "id":"4","banji":"604"}
{
   "id":"5","banji":"605"}
[hadoop@vm01 bin]$  ./spark-shell \
--jars /home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/mysql-connector-java-5.1.47.jar

scala> val df=spark.read.json("file:///home/hadoop/data/stu1.json")
scala> df.show
+--------+---+--------+-----------+
|    emal| id|    name|      phone|
+--------+---+--------+-----------+
|1@qq.com|  1|zhangsan|13721442689|
|2@qq.com|  2|    lisi|13721442687|
|3@qq.com|  3|  wangwu|13721442688|
|4@qq.com|  4|xiaoming|13721442686|
|5@qq.com|  5|xiaowang|13721442685|
+--------+---+--------+-----------+

#打印Schema信息
scala> df.printSchema
root
 |-- email: string (nullable = true)
 |-- id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- phone: string (nullable = true)

如果是读Text文件,那么读取出来就是一行记录,只有一个String类型的字段

scala> val emp=spark.read.text("file:///home/hadoop/data/test.txt")
emp: org.apache.spark.sql.DataFrame = [value: string]

scala> emp.show
+-----------+
|      value|
+-----------+
|hello spark|
|   hello mr|
| hello yarn|
| hello hive|
|hello spark|
+-----------+

scala> emp.printSchema
root
 |-- value: string (nullable = true)

注册成临时表,用Select查询

scala> df.createOrReplaceTempView("people")
scala> spark.sql("select * from people").show
+--------+---+--------+-----------+
|   email| id|    name|      phone|
+--------+---+--------+-----------+
|1@qq.com|  1|zhangsan|13721442689|
|2@qq.com|  2|    lisi|13721442687|
|3@qq.com|  3|  wangwu|13721442688|
|4@qq.com|  4|xiaoming|13721442686|
|5@qq.com|  5|xiaowang|13721442685|
|5@qq.com|  5|    null|13721442685|
+--------+---+--------+-----------+

scala> df.select("name").show 
+--------+
|    name|
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值