版本spark2.1,想在spatk-shell下使用sparksql,
因为互联网和书上讲的都是spark1.X,于是查了一下官方文档
1、准备json文件
网上抄了一个:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
文件名:jsonfile(这个尼玛都不想改名字)
2、在spark-shell提示符下必须先导入包(这啥玩意自己上网查):
import spark.implicits._(错误,不是必须的,胡搞)然后:
val df = spark.read.json("examples/src/main/resources/people.json") // Displays the content of the DataFrame to stdout df.show() // +----+-------+ // | age| name| // +----+-------+ // |null|Michael| // | 30| Andy| // | 19| Justin| // +----+-------+
再之后,就自己玩了。
加班打瞌睡抄袭官网的文档,谢谢大家!
当然如果是写程序,那前面就得加上:(反正也没测试过,官方说了算)
import org.apache.spark.sql.SparkSession val spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.some.config.option", "some-value") .getOrCreate()
建个临时表玩纯sql
df.toDF().registerTempTable("people")
df.createOrReplaceTempView("people") val sqlDF = spark.sql("SELECT * FROM people") sqlDF.show() // +----+-------+ // | age| name| // +----+-------+ // |null|Michael| // | 30| Andy| // | 19| Justin| // +----+-------+下面是我自己炒作的,没搞清楚原因(人感冒总要打瞌睡)
scala> df.toDF().registerTempTable("people")
warning: there was one deprecation warning; re-run with -deprecation for details
scala> df.toDF().registerTempTable("people")
warning: there was one deprecation warning; re-run with -deprecation for details
scala> val sqlDF = spark.sql("SELECT * FROM people")
sqlDF: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> sqlDF.show()
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
scala> df.createOrReplaceTempView("people")
scala> val sqlDf = spark.sql("SELECT * FROM people")
sqlDf: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> sqlDf.show()
+----+-------+
| age| name|
+----+-------+
scala> val sqlDF = spark.sql("SELECT * FROM people")
sqlDF: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> sqlDF.show()
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
scala> df.createOrReplaceTempView("people")
scala> val sqlDf = spark.sql("SELECT * FROM people")
sqlDf: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> sqlDf.show()
+----+-------+
| age| name|
+----+-------+
写得挺狗血的。