mysql jdbcrdd_鸡肋的JdbcRDD

今天准备将mysql的数据倒腾到RDD。非常早曾经就知道有一个JdbcRDD。就想着使用一下,结果发现却是鸡肋一个。

首先,看看JdbcRDD的定义:

* An RDD that executes an SQL query on a JDBC connection and reads results.

* For usage example, see test case JdbcRDDSuite.

*

* @param getConnection a function that returns an open Connection.

* The RDD takes care of closing the connection.

* @param sql the text of the query.

* The query must contain two ? placeholders for parameters used to partition the results.

* E.g. "select title, author from books where ? <= id and id <= ?"

* @param lowerBound the minimum value of the first placeholder

* @param upperBound the maximum value of the second placeholder

* The lower and upper bounds are inclusive.

* @param numPartitions the number of partitions.

* Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,

* the query would be executed twice, once with (1, 10) and once with (11, 20)

* @param mapRow a function from a ResultSet to a single row of the desired result type(s).

* This should only call getInt, getString, etc; the RDD takes care of calling next.

* The default maps a ResultSet to an array of Object.

*/

class JdbcRDD[T: ClassTag](

sc: SparkContext,

getConnection: () => Connection,

sql: String,

lowerBound: Long,

upperBound: Long,

numPartitions: Int,

mapRow: (ResultSet) => T = JdbcRDD.resultSetToObjectArray _)

附上个样例:

package test

import java.sql.{Connection, DriverManager, ResultSet}

import org.apache.spark.rdd.JdbcRDD

import org.apache.spark.{SparkConf, SparkContext}

object spark_mysql {

def main(args: Array[String]) {

//val conf = new SparkConf().setAppName("spark_mysql").setMaster("local")

val sc = new SparkContext("local","spark_mysql")

def createConnection() = {

Class.forName("com.mysql.jdbc.Driver").newInstance()

DriverManager.getConnection("jdbc:mysql://192.168.0.15:3306/wsmall", "root", "passwd")

}

def extractValues(r: ResultSet) = {

(r.getString(1), r.getString(2))

}

val data = new JdbcRDD(sc, createConnection, "SELECT id,aa FROM bbb where ?

<= ID AND ID <= ?", lowerBound = 3, upperBound =5, numPartitions = 1, mapRow = extractValues)

println(data.collect().toList)

sc.stop()

}

}

使用的MySQL表的数据例如以下:

c0b378c4043d050fb13f61ac0284d620.png

执行结果例如以下:

ea1c9b8228a7b0b4a91c4f5042d82bd3.png

能够看出:JdbcRDD的sql參数要带有两个?的占位符,而这两个占位符是给參数lowerBound和參数upperBound定义where语句的边界的,假设不过这种话,还能够接受;但悲催的是參数lowerBound和參数upperBound都是Long类型的,

fcb3c7655dd06ba935502876e3008d06.gif,不知道如今作为keyword或做查询的字段有多少long类型呢?不过參照JdbcRDD的源代码,用户还是能够写出符合自己需求的JdbcRDD,这算是不幸中之大幸了。

近期一直忙于炼数成金的spark课程。没多少时间整理博客。

特意给想深入了解spark的朋友推荐一位好友的博客http://www.cnblogs.com/cenyuhai/ 。里面有不少源代码博文,利于理解spark的内核。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值