def jdbc(url: String, table: String, properties: Properties): DataFrame
Construct a DataFrame representing the database table accessible via JDBC URL url named table and connection properties.
Since
1.4.0
import java.util.Properties
val prop = new Properties
// mysql
val url = "jdbc:mysql://mysqlHost:3306/database"
// oracle
//val url = "jdbc:oracle:thin:@//oracleHost:3306/database"
val tableName = "table"
// 设置连接用户&密码
val prop = new java.util.Properties
prop.setProperty("user","username")
prop.setProperty("password","pwd")
// 取得该表数据
val jdbcDF = spark.read.jdbc(url,tableName,prop)
def jdbc(url: String, table: String, predicates: Array[String], connectionProperties: Properties): DataFrame
Construct a DataFrame representing the database table accessible via JDBC URL url named table using connection properties. The predicates parameter gives a list expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame.
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
url
JDBC database url of the form jdbc:subprotocol:subname
table
Name of the table in the external database.
predicates
Condition in the where clause for each partition.
connectionProperties
JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. "fetchsize" can be used to control the number of rows per fetch.
Since
1.4.0
Mysql:
var mysql_predicates = “to_days(START_TIME)>=to_days(‘2018-01-01’) and to_days(START_TIME)<to_days(‘2018-10-01’)#to_days(START_TIME)>=to_days(‘2018-10-01’) and to_days(START_TIME)<to_days(‘2019-01-01’)#to_days(START_TIME)>=to_days(‘2019-01-01’) and to_days(START_TIME)<to_days(‘2019-11-08’)”
var mysql_predicates_array = mysql_predicates.split("#")
Oracle:
var oracle_predicates = ” to_char(START_TIME)>=‘2018-01-01’ and to_char(START_TIME)<‘2018-10-01’#to_char(START_TIME)>= 2018-10-01’ and to_char(START_TIME)<’2019-01-01’#to_char(START_TIME)>=‘2019-01-01’ and to_char(START_TIME)<=‘2019-11-01’“
var oracle_predicates_array = mysql_predicates.split("#")
def jdbc(url: String, table: String, columnName: String, lowerBound: Long, upperBound: Long, numPartitions: Int, connectionProperties: Properties): DataFrame
Construct a DataFrame representing the database table accessible via JDBC URL url named table. Partitions of the table will be retrieved in parallel based on the parameters passed to this function.
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
url
JDBC database url of the form jdbc:subprotocol:subname.
table
Name of the table in the external database.
columnName
the name of a column of numeric, date, or timestamp type that will be used for partitioning.
lowerBound
the minimum value of columnName used to decide partition stride.
upperBound
the maximum value of columnName used to decide partition stride.
numPartitions
the number of partitions. This, along with lowerBound (inclusive), upperBound (exclusive), form partition strides for generated WHERE clause expressions used to split the column columnName evenly. When the input is less than 1, the number is set to 1.
connectionProperties
JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. "fetchsize" can be used to control the number of rows per fetch and "queryTimeout" can be used to wait for a Statement object to execute to the given number of seconds.
Since
1.4.0
val url = "jdbc:mysql://mysqlHost:3306/database"
val tableName = "table"
val columnName = "colName"
val lowerBound = 1,
val upperBound = 10000000,
val numPartitions = 10,
// 设置连接用户&密码
val prop = new java.util.Properties
prop.setProperty("user","username")
prop.setProperty("password","pwd")
// 取得该表数据
val jdbcDF = spark.read.jdbc(url,tableName,columnName,lowerBound,upperBound,numPartitions,prop)
// 一些操作