Spark-shell连接MySQL
- 将hive/conf里面的 hive-site.xml复制到spark/conf/
- 将hive/lib里面的mysql-connector-java-5.1.38.jar复制到spark/jars/
- 读取MySQL文件,返回一个dataFrame
- 读取student表
val studentDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Student", "user" -> "root", "password" -> "ok")).load()

5. 读取score表
val scoreDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Score", "user" -> "root", "password" -> "ok")).load()

6.读取Teacher表
val teacherDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Teacher", "user" -> "root", "password" -> "ok")).load()

7. 读取Course表
val courseDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop001:3306/school", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "school.Course", "user" -> "root", "password" -> "ok")).load()

MySQL50题
1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数:
scoreDF.as("s1").join(scoreDF.as("s2"),"s_id").filter("s1.c_id=1 and s2.c_id=2 and s1.s_score>s2.s_score").join(studentDF,"s_id").show

2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数:
scoreDF.as("s1").join(scoreDF.as("s2"),"s_id").filter("s1.c_id=1 and s2.c_id=2 and s1.s_score<s2.s_score").join(studentDF,"s_id").show

3、查询平均成绩大于等于60 分的同学的学生编号和学生姓名和平均成绩:
scoreDF.as("s1").groupBy("s_id").avg("s_score").join(studentDF.as("s2"),"s_id").filter($"avg(s_score)">=60).show

4、查询平均成绩小于60 分的同学的学生编号和学生姓名和平均成绩(包括有成绩的和无成绩的):
studentDF.as("s2").join((scoreDF.as("s1").groupBy("s_id").avg("s_score")).as("s3"),Seq("s_id"),"left_outer").as("s").withColumnRenamed("avg(s_score)","A").where((col("A")<60) || (col("A").isNull)).show

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩:
studentDF.join(scoreDF.groupBy("s_id").coun

本文通过Spark Shell连接MySQL,展示了如何利用Spark SQL解决50道MySQL经典题目,涉及学生、教师、课程和成绩等多个方面的问题,涵盖了成绩对比、统计分析等多种查询场景。
最低0.47元/天 解锁文章
1110

被折叠的 条评论
为什么被折叠?



