SPARK DataFrame连接各种数据源
Spark Session中的DataFrame类似于一张关系型数据表。在关系型数据库中对单表或进行的查询操作,在DataFrame中都可以通过调用其API接口来实现。
一、DataFrame对象的生成
val ss = SparkSession.builder()
.appName("ta")
.master("local[4]")
//mongoddb连接配置
.config("spark.mongodb.input.uri","mongodb://username:password@192.168.1.3:27017/log.")
.config("spark.mongodb.output.uri","mongodb://username:password@192.168.1.3:27017/log")
.config("es.index.auto.create", "true")
.config("es.nodes","192.168.1.1")
.config("es.port","9200")
.getOrCreate()
1.读写mysql数据
val url = “jdbc:mysql://m000:3306/test”
val jdbcDF = ss.read.format( “jdbc” ).options(Map( “url” -> url,“user” -> “xxx”,“password” -> “xxx”, “dbtable” -> “xxx” )).load()
data2DF.write.mode("overwrite").format("jdbc").options(Map("url" ->url, "dbtable" -> "TableName")).save()
2.读写SqlServer数据
val sqlUrl="jdbc:sqlserver://192.168.1.3:1433;DatabaseName=mytable;username=xxxx;password=xxx"
val data2DF = ss.read.format("jdbc").options( Map("url" -> sqlsUrl, "dbtable" -> "TableName")).load()
data2DF.write.mode("overwrite").format("jdbc").options(Map("url" ->sqlUrl, "dbtable" -> "TableName")).save()
3、spark sql读写oracel
val conf = new SparkConf().setAppName("SDS").setMaster("local[2]")
val sc = new SparkContext(conf)
val sparkSql = new SQLContext(sc)
//Oracle服务器名连接
val url3 = "jdbc:oracle:thin:@192.168.1.1:11521/racdb"
//Oracle sid 连接
// val url3 = "jdbc:oracle:thin:@localhost:1521:orcl"
val table = "test3"
//方法一
var pro = new Properties()
pro.setProperty("user", "eic2")
pro.setProperty("password", "eic2")
val df = sql.read.jdbc(url3, table, pro)
//方法二
val df=sql.read.format("jdbc")
.option("url",url3)
.option("driver","oracle.jdbc.driver.OracleDriver")
.option("dbtable",table)
.option("user","root")
.option("password","123456")
.load()
df.createTempView("test")
sparkSql .sql(select * from test).show
4.读写MongoDB数据
import com.mongodb.spark._
import com.mongodb.spark.config.ReadConfig
//读取
val data1DF = MongoSpark.load(ss, ReadConfig(Map("collection" -> "TableName"), Some(ReadConfig(ss))))
val data2=ss.sparkContext.loadFromMongoDB(ReadConfig(Map("uri" -> readUrl))).toDF()
//第一种方式适用于读取同一个库中的数据,当在不同库中读取数据时,可以使用第二种
MongoSpark.save(datas.write.option("collection", "documentName").mode("append"))