由于官网的exmpale和查找到的资料都不没有完全对,所以整理一下.
1,首先下载jdbc的jar,下载地址: https://dev.mysql.com/downloads/connector/j/
2,数据库 test,表名people,有name和age两个字段
3,启动spark shell
./spark-shell --driver-class-path /path/to/jdbc/jar/mysql-connector-java-5.1.34-bin.jar
4,定义mysql地址
scala> val url="jdbc:mysql://localhost:3306/test"
5,创建链接信息
scala> val prop = new java.util.Properties
scala> prop.setProperty("user","root")
scala> prop.setProperty("password","pwd for root")
6,定义sqlContext,这里官网的example不太对,sqlContext需要定义
val sqlContext = spark.sqlContext
7,使用jdbc加载dataframe
scala> val df = sqlContext.read.jdbc(url,"people",prop)
8,查看DataFrame的数据
scala> df.show()
9,查看DataFrame的结构
scala> df.printSchema()
10,按照age分组,计数
scala> val countsByAge = df.groupBy("age").count()
scala> countsByAge.show()
11,参考链接:
http://spark.apache.org/examples.html
http://www.infoobjects.com/spark-connecting-to-a-jdbc-data-source-using-dataframes/
https://stackoverflow.com/questions/40537035/error-not-found-value-sqlcontext-on-emr