Spark 2.2.1 Parquet文件处理的案例与解读
(一) 加载数据
加载Parquet数据源,并将加载后的people使用createOrReplaceTempView方法注册到临时表中,然后使用SQL语句对该临时表进行操作,最后将操作结果打印出来。
scala> valpeople =spark.read.parquet("/resources/people.parquet")
18/02/18 08:51:40WARN metastore.ObjectStore: Failed to get database global_temp, returningNoSuchObjectException
people:org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala>people.createOrReplaceTempView("parquetFile")
scala> valteenagers = spark.sql("SELECT name FROM parquetFile WHERE age >= 13 ANDage <= 19")
teenagers:org.apache.spark.sql.DataFrame = [name: