spark版本 2.1.0
实验结果是会存储相同的结果
实验
实验代码使用spark中代码示例JavaSparkSQLExample.java
代码路径:$SPARK_HOME/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
该代码示例中的部分代码粘贴如下
// $example on:create_df$
Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");
//省略部分代码
// $example on:run_sql$
// Register the DataFrame as a SQL temporary view
df.createOrReplaceTempView("people");
Dataset<Row> sqlDF = spark.sql("SELECT * FROM people");
sqlDF.show();
实验数据一:原始people.json内容如下:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
输出结果为:
17/02/24 10:56:00 INFO scheduler.DAGScheduler: Job 10 finished: show at JavaSparkSQLExample.java:171, took 0.028799 s
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
实验数据二:people.json内容如下:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
{"name":"Justin", "age":19}
{"name":"Justin", "age":19}
{"name":"Michael"}
{"name":"Michael"}
输出结果为:
17/02/24 11:00:58 INFO scheduler.DAGScheduler: Job 10 finished: show at JavaSparkSQLExample.java:171, took 0.023679 s
+----+-------+
| age| name|
+----+-------+
|null|Michael|
|null|Michael|
| 30| Andy|
| 19| Justin|
| 19| Justin|
| 19| Justin|
|null|Michael|
|null|Michael|
+----+-------+
由以上实验可以看出,DataFrame会存储相同的数据。