load操作:主要用于加载数据,创建出DataFrame
save操作:主要用于将DataFrame中的数据保存到文件中
代码示例(默认为parquet数据源类型)
package wujiadong_sparkSQL
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by Administrator on 2017/2/3.
*/
object GenericLoadSave {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("GenericLoadSave")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
//load默认是加载parquet格式文件
val usersDF = sqlContext.read.load("hdfs://master:9000/student/2016113012/spark/users.parquet")
usersDF.write.save("hdfs://master:9000/student/2016113012/parquet_out1")
}
}
提交集群运行
hadoop@master:~/wujiadong$ spark-submit –class wujiadong_sparkSQL.GenericLoadSave –executor-memory 500m –total-executor-cores 2 /h