转自:http://www.k6k4.com/chapter/show/aafliljce1474164458328
1、样本数据
每一行存一个json对象
文件路径为 example/input/data
- { "name": "Andy", "age": 30 }
- { "name": "Justin", "age": 19 }
- { "name": "tom", "age": 21 }
2、加载数据
- scala> val df=spark.read.json("example/input/data")
- ...
- df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
3、查看数据
- scala> df.show
- +---+------+
- |age| name|
- +---+------+
- | 30| Andy|
- | 19|Justin|
- | 21| tom|
- +---+------+
4、查看表Schema
- scala> df.printSchema
- root
- |-- age: long (nullable = true)
- |-- name: string (nullable = true)
5、数据查询基本操作
- scala> df.select("name").show
- +------+
- | name|
- +------+
- | Andy|
- |Justin|
- | tom|
- +------+
- scala> df.select($"name",$"age"+1).show
- +------+---------+
- | name|(age + 1)|
- +------+---------+
- | Andy| 31|
- |Justin| 20|
- | tom| 22|
- +------+---------+
- scala> df.filter($"age">21).show
- +---+----+
- |age|name|
- +---+----+
- | 30|Andy|
- +---+----+