Spark-shell连接MySQL
- 将hive/conf里面的 hive-site.xml复制到spark/conf/
- 将hive/lib里面的mysql-connector-java-5.1.38.jar复制到spark/jars/
- 读取MySQL文件,返回一个dataFrame
- 读取student表
val studentDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Student", "user" -> "root", "password" -> "ok")).load()
5. 读取score表
val scoreDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Score", "user" -> "root", "password" -> "ok")).load()
6.读取Teacher表
val teacherDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Teacher", "user" -> "root", "password" -> "ok")).load()
7. 读取Course表
val courseDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Course", "user" -> "root", "password" -> "ok")).load()
MySQL50题
1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数:
scoreDF.as("s1").join(scoreDF.as("s2"),"s_id").filter("s1.c_id=1 and s2.c_id=2 and s1.s_score>s2.s_score").join(studentDF,"s_id").show
2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数:
scoreDF.as("s1").join(scoreDF.as("s2"),"s_id").filter("s1.c_id=1 and s2.c_id=2 and s1.s_score<s2.s_score").join(studentDF,"s_id").show
3、查询平均成绩大于等于60 分的同学的学生编号和学生姓名和平均成绩:
scoreDF.as("s1").groupBy("s_id").avg("s_score").join(studentDF.as("s2"),"s_id").filter($"avg(s_score)">=60).show
4、查询平均成绩小于60 分的同学的学生编号和学生姓名和平均成绩(包括有成绩的和无成绩的):
studentDF.as("s2").join((scoreDF.as("s1").groupBy("s_id").avg("s_score")).as("s3"),Seq("s_id"),"left_outer").as("s").withColumnRenamed("avg(s_score)","A").where((col("A")<60) || (col("A").isNull)).show
5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩:
studentDF.join(scoreDF.groupBy("s_id").count,Seq("s_id"