目录
4.数据的read、write和savemode
4.1 数据的读取
一些常见的数据源,parquet:是之前输出parquet文件的目录,读取该目录下的所有文件
student.json
{"name":"jack", "age":"22"}
{"name":"rose", "age":"21"}
{"name":"mike", "age":"19"}
product.csv
phone,5000,100
xiaomi,3000,300
val spark = SparkSession.builder() .master("local[*]") .appName(this.getClass.getSimpleName) .getOrCreate() //方式一: val jsonSource: DataFrame = spark.read.json("E:\\student.json") val csvSource: DataFrame = spark.read.csv("e://product.csv") val parquetSource: DataFrame = spark.read.parquet("E:/parquetOutput/*") //方式二: val jsonSource1: DataFrame = spark.read.format("json").load("E:\\student.json") val csvSource1: DataFrame = spark.read.format("csv").load("e://product.csv") val parquetSource1: DataFrame = spark.read.format("parquet").load("E:/parquetOutput/*") //方式三:默认是paprquet格式 val df: DataFrame = spark.sqlContext.load("E:/parquetOutput/*")
4.2 数据的写出
//方式一: jsonSource.write.json("./jsonOutput") jsonSource.write.parquet("./parquetOutput") jsonSource.write.csv("./scvOut") //方式二: jsonSource.write.format("json").save("./jsonOutput") jsonSource.write.format("parquet").save("./parquetOutput") jsonSource.write.format("csv").save("./scvOut") //方式三:默认parquet格式 jsonSource.write.save("./parquetOutput")
4.3 数据保存的模式
result1.write.mode(SaveMode.Append).json("spark_day01/jsonOutput1")