系统和软件配置
安装ubuntu16.04
http://blog.csdn.net/wyx100/article/details/51582617
安装hadoop(ubuntu16.04)
安装spark(ubuntu16.04)
安装mysql(ubuntu16.04)
mysql安装和检验见:
http://blog.csdn.net/fighter_yy/article/details/40753889
安装idea(ubuntu16.04)建立数据库
数据库名:sparkdb
账号:root
密码:root
建立数据库表:sparktest
id | name |
1 | a |
2 | b |
id数据类型:long
name数据类型:string
建立数据库代码
mysql> CREATE DATABASE sparkdb;
mysql> USE sparkdb;
mysql> CREATE TABLE sparktest (ID long, NAME VARCHAR(20));
mysql>insert sparktest (ID,NAME) values(1,'a'),(2,'b');
scala代码
package scala
import java.sql.DriverManager
import org.apache.spark.SparkContext
import org.apache.spark.rdd.JdbcRDD
object SparkToJDBC {
def main(args: Array[String]) {
val sc = new SparkContext("local", "mysql")
val rdd = new JdbcRDD(
sc,
() => {
Class.forName("com.mysql.jdbc.Driver").newInstance()
DriverManager.getConnection("jdbc:mysql://localhost:3306/sparkdb", "root", "root")
},
"SELECT name FROM sparktest WHERE ID >= ? AND ID <= ?",
1, 100, 3,
r => r.getString(1)).cache()
print(rdd.filter(_.contains("success")).count())
sc.stop()
}
}
print(rdd.filter(_.contains("success")).count())
执行结果是:0
rdd.filter(_.contains("success")).count()详解见:
http://blog.csdn.net/wyx100/article/details/51973712