1. JDBC
Spark SQL可以通过JDBC从关系型数据库中读取数据的方式创建DataFrame,通过对DataFrame一系列的计算后,还可以将数据再写回关系型数据库中。
1.1. 从MySQL中加载数据(Spark Shell方式)
1.启动Spark Shell,必须指定mysql连接驱动jar包
[root@hadoop1 spark-2.1.1-bin-hadoop2.7]# bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --jars /home/tuzq/software/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.38.jar --driver-class-path /home/tuzq/software/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.38.jar
2.从mysql中加载数据
进入bigdata中创建person表:
CREATE DATABASE bigdata CHARACTER SET utf8;
USE bigdata;
CREATE TABLE person ( id INT(10) AUTO_INCREMENT PRIMARY KEY, name varchar(100), age INT(3) ) ENGINE=INNODB DEFAULT CHARSET=utf8;
并初始化数据:
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" ->"jdbc:mysql://hadoop10:3306/bigdata", "driver" ->"com.mysql.jdbc.Driver", "dbtable&