1 、SparkSQL读取Json文件
先随便造两份Json格式数据。
[hadoop@vm01 data]$ vi stu1.json
{
"id":"1","name":"zhangsan","phone":"13721442689","email":"1@qq.com"}
{
"id":"2","name":"lisi","phone":"13721442687","email":"2@qq.com"}
{
"id":"3","name":"wangwu","phone":"13721442688","email":"3@qq.com"}
{
"id":"4","name":"xiaoming","phone":"13721442686","email":"4@qq.com"}
{
"id":"5","name":"xiaowang","phone":"13721442685","email":"5@qq.com"}
[hadoop@vm01 data]$ vi banji.json
{
"id":"1","banji":"601"}
{
"id":"2","banji":"602"}
{
"id":"3","banji":"603"}
{
"id":"4","banji":"604"}
{
"id":"5","banji":"605"}
[hadoop@vm01 bin]$ ./spark-shell \
--jars /home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/mysql-connector-java-5.1.47.jar
scala> val df=spark.read.json("file:///home/hadoop/data/stu1.json")
scala> df.show
+--------+---+--------+-----------+
| emal| id| name| phone|
+--------+---+--------+-----------+
|1@qq.com| 1|zhangsan|13721442689|
|2@qq.com| 2| lisi|13721442687|
|3@qq.com| 3| wangwu|13721442688|
|4@qq.com| 4|xiaoming|13721442686|
|5@qq.com| 5|xiaowang|13721442685|
+--------+---+--------+-----------+
#打印Schema信息
scala> df.printSchema
root
|-- email: string (nullable = true)
|-- id: string (nullable = true)
|-- name: string (nullable = true)
|-- phone: string (nullable = true)
如果是读Text文件,那么读取出来就是一行记录,只有一个String类型的字段
scala> val emp=spark.read.text("file:///home/hadoop/data/test.txt")
emp: org.apache.spark.sql.DataFrame = [value: string]
scala> emp.show
+-----------+
| value|
+-----------+
|hello spark|
| hello mr|
| hello yarn|
| hello hive|
|hello spark|
+-----------+
scala> emp.printSchema
root
|-- value: string (nullable = true)
注册成临时表,用Select查询
scala> df.createOrReplaceTempView("people")
scala> spark.sql("select * from people").show
+--------+---+--------+-----------+
| email| id| name| phone|
+--------+---+--------+-----------+
|1@qq.com| 1|zhangsan|13721442689|
|2@qq.com| 2| lisi|13721442687|
|3@qq.com| 3| wangwu|13721442688|
|4@qq.com| 4|xiaoming|13721442686|
|5@qq.com| 5|xiaowang|13721442685|
|5@qq.com| 5| null|13721442685|
+--------+---+--------+-----------+
scala> df.select("name").show
+--------+
| name|