sql练习50题 SparkRDD版本(含spark-shell 连接mysql方法)

Spark-shell连接MySQL

  1. 将hive/conf里面的 hive-site.xml复制到spark/conf/
  2. 将hive/lib里面的mysql-connector-java-5.1.38.jar复制到spark/jars/
    需要重新启动spark-shell
  3. 读取MySQL文件,返回一个dataFrame
[root@hadoop001 software]# mysql -uroot -pok

mysql> create database school;
Query OK, 1 row affected (0.00 sec)

mysql> source /software/schoolmysql50Bak.sql

[root@hadoop001 sbin]# spark-shell

读取student表
scala> val studentDF=spark.read.format("jdbc").options(Map("url"->"jdbc:mysql://hadoop001:3306/school","driver"-> "com.mysql.jdbc.Driver","dbtable"->"school.Student","user"->"root","password"->"ok")).load() 

读取Score表
val scoreDF=spark.read.format("jdbc").options(Map("url"->"jdbc:mysql://hadoop001:3306/school","driver"->"com.mysql.jdbc.Driver","dbtable"->"school.Score","user"->"root","password"->"ok")).load

读取Teacher表
val teacherDF=spark.read.format("jdbc").options(Map("url"->"jdbc:mysql://hadoop001:3306/school","driver"->"com.mysql.jdbc.Driver","dbtable"->"school.Teacher","user"->"root","password"->"ok")).load

读取Course表
val courseDF=spark.read.format("jdbc").options(Map("url"->"jdbc:mysql://hadoop001:3306/school","driver"->"com.mysql.jdbc.Driver","dbtable"->"school.Course","user"->"root","password"->"ok")).load

1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数:
scala> scoreDF.as("s1").join(scoreDF.as("s2"),"s_id").filter("s1.c_id=1 and s2.c_id=2 and s1.s_score>s2.s_score").join(studentDF,"s_id").show
+----+----+-------+----+-------+------+----------+-----+
|s_id|c_id|s_score|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+----+-------+------+----------+-----+
|  02|  01|     70|  02|     60|  钱电|1990-12-21||
|  04|  01|     50|  02|     30|  李云|1990-08-06||
+----+----+-------+----+-------+------+----------+-----+

 
 
2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数:
scala> scoreDF.as("s1").join(scoreDF.as("s2"),"s_id").filter("s1.c_id=1 and s2.c_id=2 and s1.s_score<s2.s_score").join(studentDF,"s_id").show
+----+----+-------+----+-------+------+----------+-----+
|s_id|c_id|s_score|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+----+-------+------+----------+-----+
|  01|  01|     80|  02|     90|  赵雷|1990-01-01||
|  05|  01|     76|  02|     87|  周梅|1991-12-01||
+----+----+-------+----+-------+------+----------+-----+

 
 
3、查询平均成绩大于等于60 分的同学的学生编号和学生姓名和平均成绩:
scala> scoreDF.as("s1").groupBy("s_id").avg("s_score").join(studentDF.as("s2"),"s_id").filter($"avg(s_score)">=60).show
+----+-----------------+------+----------+-----+
|s_id|     avg(s_score)|s_name|   s_birth|s_sex|
+----+-----------------+------+----------+-----+
|  07|             93.5|  郑竹|1989-07-01||
|  01|89.66666666666667|  赵雷|1990-01-01||
|  05|             81.5|  周梅|1991-12-01||
|  03|             80.0|  孙风|1990-05-20||
|  02|             70.0|  钱电|1990-12-21||
+----+-----------------+------+----------+-----+

4、查询平均成绩小于60 分的同学的学生编号和学生姓名和平均成绩(包括有成绩的和无成绩的):
scala> studentDF.as("s2")
.join((scoreDF.as("s1").groupBy("s_id").avg("s_score"))
.as("s3"),Seq("s_id"),"left_outer").as("s")
.withColumnRenamed("avg(s_score)","A").where((col("A")<60)||(col("A").isNull)).show
+----+------+----------+-----+------------------+
|s_id|s_name|   s_birth|s_sex|                 A|
+----+------+----------+-----+------------------+
|  08|  王菊|1990-01-20||              null|
|  06|  吴兰|1992-03-01||              32.5|
|  04|  李云|1990-08-06||33.333333333333336|
+----+------+----------+-----+------------------+

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩:
scala> studentDF.join(scoreDF.groupBy("s_id").count,
Seq("s_id"),"left_outer").join(scoreDF.groupBy("s_id").sum("s_score"),Seq("s_id"),"left_outer").show
+----+------+----------+-----+-----+------------+
|s_id|s_name|   s_birth|s_sex|count|sum(s_score)|
+----+------+----------+-----+-----+------------+
|  07|  郑竹|1989-07-01||    2|         187|
|  01|  赵雷|1990-01-01||    3|         269|
|  05|  周梅|1991-12-01||    2|         163|
|  08|  王菊|1990-01-20|| null|        null|
|  03|  孙风|1990-05-20||    3|         240|
|  02|  钱电|1990-12-21||    3|         210|
|  06|  吴兰|1992-03-01||    2|          65|
|  04|  李云|1990-08-06||    3|         100|
+----+------+----------+-----+-----+------------+

6、查询"李"姓老师的数量:
scala> teacherDF.where("t_name like '李%'").select("t_id").count
scala> teacherDF.where("t_name like '李%'").select("t_id").count
res5: Long = 1

7、查询学过"张三"老师授课的同学的信息:
scoreDF.join(courseDF,"c_id").join(teacherDF,"t_id").filter("t_name='张三'").join(studentDF,"s_id").show

8、查询没学过"张三"老师授课的同学的信息:
scala> studentDF.join(scoreDF.join(courseDF,"c_id").join(teacherDF,"t_id"),Seq("s_id"),"left_outer").where("t_name!='张三' or t_name is null").show
+----+------+----------+-----+----+----+-------+------+------+
|s_id|s_name|   s_birth|s_sex|t_id|c_id|s_score|c_name|t_name|
+----+------+----------+-----+----+----+-------+------+------+
|  07|  郑竹|1989-07-01||  03|  03|     98|  英语|  王五|
|  01|  赵雷|1990-01-01||  03|  03|     99|  英语|  王五|
|  01|  赵雷|1990-01-01||  02|  01|     80|  语文|  李四|
|  05|  周梅|1991-12-01||  02|  01|     76|  语文|  李四|
|  08|  王菊|1990-01-20||null|null|   null|  null|  null|
|  03|  孙风|1990-05-20||  03|  03|     80|  英语|  王五|
|  03|  孙风|1990-05-20||  02|  01|     80|  语文|  李四|
|  02|  钱电|1990-12-21||  03|  03|     80|  英语|  王五|
|  02|  钱电|1990-12-21||  02|  01|     70|  语文|  李四|
|  06|  吴兰|1992-03-01||  03|  03|     34|  英语|  王五|
|  06|  吴兰|1992-03-01||  02|  01|     31|  语文|  李四|
|  04|  李云|1990-08-06||  03|  03|     20|  英语|  王五|
|  04|  李云|1990-08-06||  02|  01|     50|  语文|  李四|
+----+------+----------+-----+----+----+-------+------+------+

9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息:
scala> studentDF.join(scoreDF.filter("c_id=1"),"s_id").join(scoreDF.filter("c_id=2"),"s_id").show
+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
|  01|  赵雷|1990-01-01||  01|     80|  02|     90|
|  05|  周梅|1991-12-01||  01|     76|  02|     87|
|  03|  孙风|1990-05-20||  01|     80|  02|     80|
|  02|  钱电|1990-12-21||  01|     70|  02|     60|
|  04|  李云|1990-08-06||  01|     50|  02|     30|
+----+------+----------+-----+----+-------+----+-------+

10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息:
scala> studentDF.join(scoreDF.where("c_id=2"),Seq("s_id"),"left_outer").as("s2").where("s2.c_id is null").join(scoreDF.where("c_id=1"),"s_id").show
+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
|  06|  吴兰|1992-03-01||null|   null|  01|     31|
+----+------+----------+-----+----+-------+----+-------+
 
11、查询没有学全所有课程的同学的信息:
scala> studentDF.join(scoreDF.groupBy("s_id").count.as("s1"),Seq("s_id"),"left_outer").where("s1.count<3 or s1.count is null").show
+----+------+----------+-----+-----+
|s_id|s_name|   s_birth|s_sex|count|
+----+------+----------+-----+-----+
|  07|  郑竹|1989-07-01||    2|
|  05|  周梅|1991-12-01||    2|
|  08|  王菊|1990-01-20|| null|
|  06|  吴兰|1992-03-01||    2|
+----+------+----------+-----+-----+

12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:
scala> studentDF.join(scoreDF,"s_id").as("a").join(scoreDF.select("c_id").where("s_id=1").as("b"),"c_id").as("c").select("s_id").distinct.where("s_id!=1").join(studentDF,"s_id").show
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  07|  郑竹|1989-07-01||
|  05|  周梅|1991-12-01||
|  03|  孙风|1990-05-20||
|  02|  钱电|1990-12-21||
|  06|  吴兰|1992-03-01||
|  04|  李云|1990-08-06||
+----+------+----------+-----+

13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:
scala> studentDF.join(scoreDF,"s_id").as("a").join(scoreDF.where("s_id=1").as("b"),"c_id").groupBy("a.s_id").count.where(s"count=${scoreDF.where("s_id=1").count} and a.s_id!=1").join(studentDF,"s_id").show
+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  03|    3|  孙风|1990-05-20||
|  02|    3|  钱电|1990-12-21||
|  04|    3|  李云|1990-08-06||
+----+-----+------+----------+-----+

14、查询没学过"张三"老师讲授的任一门课程的学生姓名:
scala> studentDF.join(scoreDF,"s_id").join(courseDF,"c_id").join(teacherDF.where("t_name='张三'"),"t_id").as("a").select("s_id").join(studentDF.as("b"),Seq("s_id"),"right_outer").where("a.s_id is null").select("s_name").show
+------+
|s_name|
+------+
|  王菊|
|  吴兰|
+------+

15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:
scala> scoreDF.where("s_score<60").groupBy("s_id").count.where("count>=2").join(scoreDF,"s_id").groupBy("s_id").avg("s_score").join(studentDF,"s_id").show
+----+------------------+------+----------+-----+
|s_id|      avg(s_score)|s_name|   s_birth|s_sex|
+----+------------------+------+----------+-----+
|  06|              32.5|  吴兰|1992-03-01||
|  04|33.333333333333336|  李云|1990-08-06||
+----+------------------+------+----------+-----+

16、检索"01"课程分数小于60,按分数降序排列的学生信息:
scala> scoreDF.where("c_id=1").join(studentDF,Seq("s_id"),"right_outer").where("s_score<60 or s_score is null").orderBy($"s_score".desc).show
+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+------+----------+-----+
|  04|  01|     50|  李云|1990-08-06||
|  06|  01|     31|  吴兰|1992-03-01||
|  07|null|   null|  郑竹|1989-07-01||
|  08|null|   null|  王菊|1990-01-20||
+----+----+-------+------+----------+-----+

17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩:
scala> studentDF.join(scoreDF,Seq("s_id"),"left_outer").groupBy("s_id").avg("s_score").join(studentDF.join(scoreDF,"s_id"),Seq("s_id"),"left_outer").orderBy($"avg(s_score)".desc).show
+----+------------------+------+----------+-----+----+-------+
|s_id|      avg(s_score)|s_name|   s_birth|s_sex|c_id|s_score|
+----+------------------+------+----------+-----+----+-------+
|  07|              93.5|  郑竹|1989-07-01||  02|     89|
|  07|              93.5|  郑竹|1989-07-01||  03|     98|
|  01| 89.66666666666667|  赵雷|1990-01-01||  02|     90|
|  01| 89.66666666666667|  赵雷|1990-01-01||  03|     99|
|  01| 89.66666666666667|  赵雷|1990-01-01||  01|     80|
|  05|              81.5|  周梅|1991-12-01||  01|     76|
|  05|              81.5|  周梅|1991-12-01||  02|     87|
|  03|              80.0|  孙风|1990-05-20||  02|     80|
|  03|              80.0|  孙风|1990-05-20||  01|     80|
|  03|              80.0|  孙风|1990-05-20||  03|     80|
|  02|              70.0|  钱电|1990-12-21||  02|     60|
|  02|              70.0|  钱电|1990-12-21||  01|     70|
|  02|              70.0|  钱电|1990-12-21||  03|     80|
|  04|33.333333333333336|  李云|1990-08-06||  01|     50|
|  04|33.333333333333336|  李云|1990-08-06||  02|     30|
|  04|33.333333333333336|  李云|1990-08-06||  03|     20|
|  06|              32.5|  吴兰|1992-03-01||  01|     31|
|  06|              32.5|  吴兰|1992-03-01||  03|     34|
|  08|              null|  null|      null| null|null|   null|
+----+------------------+------+----------+-----+----+-------+

18、查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name
,最高分,最低分,平均分,及格率,中等率,优良率,优秀率:



19、按各科成绩进行排序,并显示排名:


20、查询学生的总成绩并进行排名:
scala> studentDF.join(scoreDF,"s_id").groupBy("s_id").sum("s_score").orderBy($"sum(s_score)".desc).show
+----+------------+
|s_id|sum(s_score)|
+----+------------+
|  01|         269|
|  03|         240|
|  02|         210|
|  07|         187|
|  05|         163|
|  04|         100|
|  06|          65|
+----+------------+

21、查询不同老师所教不同课程平均分从高到低显示:
scala> scoreDF.join(courseDF,"c_id").join(teacherDF,"t_id").groupBy("t_id","c_id").avg("s_score").orderBy($"avg(s_score)".desc).show
+----+----+-----------------+
|t_id|c_id|     avg(s_score)|
+----+----+-----------------+
|  01|  02|72.66666666666667|
|  03|  03|             68.5|
|  02|  01|             64.5|
+----+----+-----------------+

22、查询所有课程的成绩第2 名到第3 名的学生信息及该课程成绩:
scala> scoreDF.selectExpr("*","row_number() over(partition by c_id order by s_score desc) rank").where("rank between 2 and 3").join(studentDF,"s_id").show
+----+----+-------+----+------+----------+-----+
|s_id|c_id|s_score|rank|s_name|   s_birth|s_sex|
+----+----+-------+----+------+----------+-----+
|  07|  03|     98|   2|  郑竹|1989-07-01||
|  07|  02|     89|   2|  郑竹|1989-07-01||
|  05|  01|     76|   3|  周梅|1991-12-01||
|  05|  02|     87|   3|  周梅|1991-12-01||
|  03|  01|     80|   2|  孙风|1990-05-20||
|  02|  03|     80|   3|  钱电|1990-12-21||
+----+----+-------+----+------+----------+-----+

23、统计各科成绩各分数段人数:课程编号,课程名称,[100-85],[85-70],[70-60],[0-60]及所占百分比:

24、查询学生平均成绩及其名次:
scala> scoreDF.groupBy("s_id").avg("s_score").selectExpr("*",s"row_number() over(order by 'avg(s_score)' desc) as rank").show
+----+------------------+----+
|s_id|      avg(s_score)|rank|
+----+------------------+----+
|  07|              93.5|   1|
|  01| 89.66666666666667|   2|
|  05|              81.5|   3|
|  03|              80.0|   4|
|  02|              70.0|   5|
|  06|              32.5|   6|
|  04|33.333333333333336|   7|
+----+------------------+----+

 
25、查询各科成绩前三名的记录
scala> scoreDF.selectExpr("*","row_number() over(partition by c_id order by s_score desc) rank").where("rank<=3").show
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
|  01|  01|     80|   1|
|  03|  01|     80|   2|
|  05|  01|     76|   3|
|  01|  03|     99|   1|
|  07|  03|     98|   2|
|  02|  03|     80|   3|
|  01|  02|     90|   1|
|  07|  02|     89|   2|
|  05|  02|     87|   3|
+----+----+-------+----+

26、查询每门课程被选修的学生数:
scala> scoreDF.groupBy("c_id").count.show
+----+-----+
|c_id|count|
+----+-----+
|  01|    6|
|  03|    6|
|  02|    6|
+----+-----+
 
27、查询出只有两门课程的全部学生的学号和姓名:
scala> scoreDF.groupBy("s_id").count.where("count=2").join(studentDF,"s_id").show
+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  07|    2|  郑竹|1989-07-01||
|  05|    2|  周梅|1991-12-01||
|  06|    2|  吴兰|1992-03-01||
+----+-----+------+----------+-----+

28、查询男生、女生人数:
studentDF.groupBy("s_sex").count.show
+-----+-----+
|s_sex|count|
+-----+-----+
||    4|
||    4|
+-----+-----+


29、查询名字中含有"风"字的学生信息:
studentDF.select("s_name like '%风%'").show
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  03|  孙风|1990-05-20||
+----+------+----------+-----+


30、查询同名同性学生名单,并统计同名人数:
studentDF.groupBy("s_name").count.where("count>1").show
+------+-----+
|s_name|count|
+------+-----+
+------+-----+

31、查询1990年出生的学生名单:
studentDF.where("year(s_birth)=1990").show
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  01|  赵雷|1990-01-01||
|  02|  钱电|1990-12-21||
|  03|  孙风|1990-05-20||
|  04|  李云|1990-08-06||
|  08|  王菊|1990-01-20||
+----+------+----------+-----+

32、查询每门课程的平均成绩,结果按平均成绩降序排列,平均成绩相同时,按课程编号升序排列:
scoreDF.groupBy("c_id").avg("s_score")









评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值