一、SparkSQL输入输出
1.1 SparkSQL输入
写法一:
SparkSession对象.read.json("路径")
SparkSession对象.read.jdbc("路径")
SparkSession对象.read.csv("路径")
SparkSession对象.read. parquet("路径") Parquet格式经常在Hadoop生态圈中被使用,它也支持Spark SQL的全部数据型
SparkSession对象.read.orc("路径")
SparkSession对象.read.table("路径")
SparkSession对象.read.text("路径")
SparkSession对象.read. textFile("路径")
写法二:
SparkSession对象.read.format("json").load("路径")
ps:若不执行format默认是parquet格式
1.2 SparkSQL输出
写法一:
DataFrame或DataSet对象.write.json("路径")
DataFrame或DataSet对象.write.jdbc("路径")
DataFrame或DataSet对象.write.csv("路径")
DataFrame或DataSet对象.write.parquet("路径")
DataFrame或DataSet对象.write.orc("路径")
DataFrame或DataSet对象.write.table("路径")
DataFrame或DataSet对象.write.text("路径")
写法二:
DataFrame或DataSet对象.write.fomat("jdbc").中间可能其他的参数.save()
ps:典型的是saveMode模式 即 mode方法
例如:df.write.mode(SaveMode.Append).save(“f://bigdata/out”) 此处省略了format
若不执行format默认是parquet格式
二、SparkSQL读写MySQL
2.1 Spark读取MySQL
两种方式
package scalaBase.day15
import java.util.Properties
import org.apache.spark.sql.{DataFrame, SparkSession}
object SparkSQLReadMySQL {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().appName("SparkSQLReadMySQL").master("local").getOrCreate()
//方法一
val url="jdbc:mysql://localhost:3306/test"
val prop = new Properties()
prop.put("user","root")
prop.put("password","")
val df = spark.read.jdbc(url,"student",prop)
// df.show()
//方法二
val df2 = spark.read.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/test")
.option("dbtable", "student")
.option("user", "root")
.option("password", "")
.load()
df2.show()
spark.stop()
}
}
2.2 SparkSQL写数据到MySQL
列举两种方法
package scalaBase.day15
import java.util.Properties
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
object SparkSQLWriteMySQL {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local").setAppName("sparkWriteMySQL")
val sc = new SparkContext(conf)
val spark = SparkSession.builder().config(conf).getOrCreate()
val rdd = sc.textFile("data/people.txt").map(x => {
val f = x.split(",")
(Row(f(0), f(1).trim.toInt))
})
val schema = StructType(
Array(
StructField("name", StringType, false),
StructField("age", IntegerType, true)
)
)
import spark.implicits._
val df = spark.createDataFrame(rdd,schema)
//表可以不存在,通过读取的数据可以直接生成表
//方法一
val url="jdbc:mysql://localhost:3306/test"
val prop = new Properties()
prop.put("user","root")
prop.put("password","")
// df.write.jdbc(url,"people",prop)
//方法二
df.write.format("jdbc")
.option("url","jdbc:mysql://localhost:3306/test")
.option("dbtable","people")
.option("user","root")
.option("password","")
.mode(SaveMode.Append) //这一条不是必须的
.save()
spark.stop()
}
}