大数据——Spark写MySQL五十道练习题

Spark写SQL五十道练习题

表名和字段信息

  • 学生表
Student
s_id:学号
s_name:姓名
s_birth:出生日期
s_sex:性别
  • 课程表
Course
c_id:课程编号
c_name:课程名称
t_id:教师编号
  • 教师表
Teacher
t_id:教师编号
t_name:教师姓名
  • 成绩表
Score
s_id:学生编号
c_id:课程编号
s_score:分数

建表

  • 建表脚本
/*
SQLyog Professional v12.09 (64 bit)
MySQL - 5.7.29 : Database - school
*********************************************************************
*/


/*!40101 SET NAMES utf8 */;

/*!40101 SET SQL_MODE=''*/;

/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
CREATE DATABASE /*!32312 IF NOT EXISTS*/`school` /*!40100 DEFAULT CHARACTER SET utf8 */;

USE `school`;

/*Table structure for table `Course` */

DROP TABLE IF EXISTS `Course`;

CREATE TABLE `Course` (
  `c_id` varchar(20) NOT NULL,
  `c_name` varchar(20) NOT NULL DEFAULT '',
  `t_id` varchar(20) NOT NULL,
  PRIMARY KEY (`c_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/*Data for the table `Course` */

insert  into `Course`(`c_id`,`c_name`,`t_id`) values ('01','语文','02'),('02','数学','01'),('03','英语','03');

/*Table structure for table `Score` */

DROP TABLE IF EXISTS `Score`;

CREATE TABLE `Score` (
  `s_id` varchar(20) NOT NULL,
  `c_id` varchar(20) NOT NULL,
  `s_score` int(3) DEFAULT NULL,
  PRIMARY KEY (`s_id`,`c_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/*Data for the table `Score` */

insert  into `Score`(`s_id`,`c_id`,`s_score`) values ('01','01',80),('01','02',90),('01','03',99),('02','01',70),('02','02',60),('02','03',80),('03','01',80),('03','02',80),('03','03',80),('04','01',50),('04','02',30),('04','03',20),('05','01',76),('05','02',87),('06','01',31),('06','03',34),('07','02',89),('07','03',98);

/*Table structure for table `Student` */

DROP TABLE IF EXISTS `Student`;

CREATE TABLE `Student` (
  `s_id` varchar(20) NOT NULL,
  `s_name` varchar(20) NOT NULL DEFAULT '',
  `s_birth` varchar(20) NOT NULL DEFAULT '',
  `s_sex` varchar(10) NOT NULL DEFAULT '',
  PRIMARY KEY (`s_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/*Data for the table `Student` */

insert  into `Student`(`s_id`,`s_name`,`s_birth`,`s_sex`) values ('01','赵雷','1990-01-01','男'),('02','钱电','1990-12-21','男'),('03','孙风','1990-05-20','男'),('04','李云','1990-08-06','男'),('05','周梅','1991-12-01','女'),('06','吴兰','1992-03-01','女'),('07','郑竹','1989-07-01','女'),('08','王菊','1990-01-20','女');

/*Table structure for table `Teacher` */

DROP TABLE IF EXISTS `Teacher`;

CREATE TABLE `Teacher` (
  `t_id` varchar(20) NOT NULL,
  `t_name` varchar(20) NOT NULL DEFAULT '',
  PRIMARY KEY (`t_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/*Data for the table `Teacher` */

insert  into `Teacher`(`t_id`,`t_name`) values ('01','张三'),('02','李四'),('03','王五');

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
  • 启动mysql
[root@hadoop100 jars]# mysql -uroot -pok
  • 导入建表脚本
mysql> source /opt/schoolmysql50Bak.sql
  • 查看表
mysql> use school;
mysql> show tables;

在这里插入图片描述

  • 使用Java连接至MySQL
package nj.zb.kb09

import java.util.Properties

import org.apache.spark.sql._

object ConnectMySQL {
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession.builder().master("local[*]").appName("ConnectMysql").getOrCreate()

    val url = "jdbc:mysql://192.168.136.100:3306/school"
    val user = "root"
    val pwd = "ok"
    val driver = "com.mysql.jdbc.Driver"

    import spark.implicits._


    val prop = new Properties()
    prop.setProperty("user", user)
    prop.setProperty("password", pwd)
    prop.setProperty("driver", driver)

    val courseTable = "Course"
    val scoreTable = "Score"
    val studentTable = "Student"
    val teacherTable = "Teacher"

    val courseTableDF: DataFrame = spark.read.jdbc(url, courseTable, prop)
    val scoreTableDF: DataFrame = spark.read.jdbc(url, scoreTable, prop)
    val studentTableDF: DataFrame = spark.read.jdbc(url, studentTable, prop)
    val teacherTableDF: DataFrame = spark.read.jdbc(url, teacherTable, prop)
    }
}

MySQL练习题

1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数

val frame1: DataFrame = scoreTableDF.as("s1").join(scoreTableDF.as("s2"), "s_id").filter("s1.c_id=01 and s2.c_id=02 and s1.s_score>s2.s_score").join(studentTableDF, "s_id")
frame1.show()
    
+----+----+-------+----+-------+------+----------+-----+
|s_id|c_id|s_score|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+----+-------+------+----------+-----+
|  02|  01|     70|  02|     60|    钱电|1990-12-21||
|  04|  01|     50|  02|     30|    李云|1990-08-06||
+----+----+-------+----+-------+------+----------+-----+

2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数

val frame2: DataFrame = scoreTableDF.as("s1").join(scoreTableDF.as("s2"), "s_id").filter("s1.c_id=01 and s2.c_id=02 and s1.s_score<s2.s_score").join(studentTableDF, "s_id")
frame2.show()
    
+----+----+-------+----+-------+------+----------+-----+
|s_id|c_id|s_score|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+----+-------+------+----------+-----+
|  01|  01|     80|  02|     90|    赵雷|1990-01-01||
|  05|  01|     76|  02|     87|    周梅|1991-12-01||
+----+----+-------+----+-------+------+----------+-----+

3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩

val frame3: Dataset[Row] = scoreTableDF.as("s1").groupBy("s_id").avg("s_score").join(studentTableDF.as("s2"), "s_id").filter($"avg(s_score)" >= 60)
frame3.show()

+----+-----------------+------+----------+-----+
|s_id|     avg(s_score)|s_name|   s_birth|s_sex|
+----+-----------------+------+----------+-----+
|  07|             93.5|    郑竹|1989-07-01||
|  01|89.66666666666667|    赵雷|1990-01-01||
|  05|             81.5|    周梅|1991-12-01||
|  03|             80.0|    孙风|1990-05-20||
|  02|             70.0|    钱电|1990-12-21||
+----+-----------------+------+----------+-----+

4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩:(包括有成绩的和无成绩的)

val frame4: Dataset[Row] = studentTableDF.as("s1").join(scoreTableDF.as("s2").groupBy("s_id").avg("s_score"), Seq("s_id"), "left_outer").where($"avg(s_score)" < 60 || $"avg(s_score)".isNull)

frame4.show()

+----+------+----------+-----+------------------+
|s_id|s_name|   s_birth|s_sex|      avg(s_score)|
+----+------+----------+-----+------------------+
|  08|    王菊|1990-01-20||              null|
|  06|    吴兰|1992-03-01||              32.5|
|  04|    李云|1990-08-06||33.333333333333336|
+----+------+----------+-----+------------------+

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩

val frame5: DataFrame = studentTableDF.join(scoreTableDF.groupBy("s_id").count(), Seq("s_id"), "left_outer").join(scoreTableDF.groupBy("s_id").sum(), Seq("s_id"), "left_outer")
frame5.show()

+----+------+----------+-----+-----+------------+
|s_id|s_name|   s_birth|s_sex|count|sum(s_score)|
+----+------+----------+-----+-----+------------+
|  07|    郑竹|1989-07-01||    2|         187|
|  01|    赵雷|1990-01-01||    3|         269|
|  05|    周梅|1991-12-01||    2|         163|
|  08|    王菊|1990-01-20|| null|        null|
|  03|    孙风|1990-05-20||    3|         240|
|  02|    钱电|1990-12-21||    3|         210|
|  06|    吴兰|1992-03-01||    2|          65|
|  04|    李云|1990-08-06||    3|         100|
+----+------+----------+-----+-----+------------+

6、查询"李"姓老师的数量

val frame6: Long = teacherTableDF.where("t_name like '李%'").count()
println(frame6)

1

7、查询学过"张三"老师授课的同学的信息

val frame7: DataFrame = scoreTableDF.join(courseTableDF, "c_id").join(teacherTableDF, "t_id").where("t_name='张三'").join(studentTableDF, "s_id")
frame7.show()

+----+----+----+-------+------+------+------+----------+-----+
|s_id|t_id|c_id|s_score|c_name|t_name|s_name|   s_birth|s_sex|
+----+----+----+-------+------+------+------+----------+-----+
|  07|  01|  02|     89|    数学|    张三|    郑竹|1989-07-01||
|  01|  01|  02|     90|    数学|    张三|    赵雷|1990-01-01||
|  05|  01|  02|     87|    数学|    张三|    周梅|1991-12-01||
|  03|  01|  02|     80|    数学|    张三|    孙风|1990-05-20||
|  02|  01|  02|     60|    数学|    张三|    钱电|1990-12-21||
|  04|  01|  02|     30|    数学|    张三|    李云|1990-08-06||
+----+----+----+-------+------+------+------+----------+-----+

8、查询没学过"张三"老师授课的同学的信息

val frame8: DataFrame = scoreTableDF.join(courseTableDF, "c_id").join(teacherTableDF, Seq("t_id"), "left_outer").where("t_name!='张三' or t_name is null").join(studentTableDF, "s_id")
frame8.show()

+----+----+----+-------+------+------+------+----------+-----+
|s_id|t_id|c_id|s_score|c_name|t_name|s_name|   s_birth|s_sex|
+----+----+----+-------+------+------+------+----------+-----+
|  07|  03|  03|     98|    英语|    王五|    郑竹|1989-07-01||
|  01|  03|  03|     99|    英语|    王五|    赵雷|1990-01-01||
|  01|  02|  01|     80|    语文|    李四|    赵雷|1990-01-01||
|  05|  02|  01|     76|    语文|    李四|    周梅|1991-12-01||
|  03|  03|  03|     80|    英语|    王五|    孙风|1990-05-20||
|  03|  02|  01|     80|    语文|    李四|    孙风|1990-05-20||
|  02|  03|  03|     80|    英语|    王五|    钱电|1990-12-21||
|  02|  02|  01|     70|    语文|    李四|    钱电|1990-12-21||
|  06|  03|  03|     34|    英语|    王五|    吴兰|1992-03-01||
|  06|  02|  01|     31|    语文|    李四|    吴兰|1992-03-01||
|  04|  03|  03|     20|    英语|    王五|    李云|1990-08-06||
|  04|  02|  01|     50|    语文|    李四|    李云|1990-08-06||
+----+----+----+-------+------+------+------+----------+-----+

9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息

val frame9: DataFrame = studentTableDF.join(scoreTableDF.filter("c_id=01"), "s_id").join(scoreTableDF.filter("c_id=02"), "s_id")
frame9.show()

+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
|  01|    赵雷|1990-01-01||  01|     80|  02|     90|
|  05|    周梅|1991-12-01||  01|     76|  02|     87|
|  03|    孙风|1990-05-20||  01|     80|  02|     80|
|  02|    钱电|1990-12-21||  01|     70|  02|     60|
|  04|    李云|1990-08-06||  01|     50|  02|     30|
+----+------+----------+-----+----+-------+----+-------+

10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息

val frame10: DataFrame = studentTableDF.join(scoreTableDF.filter("c_id=2"), Seq("s_id"), "leftouter").where("c_id is null").join(scoreTableDF.filter("c_id=1"), Seq("s_id"))
frame10.show()

+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
|  06|    吴兰|1992-03-01||null|   null|  01|     31|
+----+------+----------+-----+----+-------+----+-------+

11、查询没有学全所有课程的同学的信息

val frame11: DataFrame = studentTableDF.join(scoreTableDF, Seq("s_id"), "leftouter").groupBy("s_id").count().where("count!=3 ").join(studentTableDF, "s_id")
frame11.show()

+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  07|    2|    郑竹|1989-07-01||
|  05|    2|    周梅|1991-12-01||
|  08|    1|    王菊|1990-01-20||
|  06|    2|    吴兰|1992-03-01||
+----+-----+------+----------+-----+

12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息

val frame12: DataFrame = studentTableDF.join(scoreTableDF, "s_id").as("d").join(scoreTableDF.where("s_id=1"), "c_id").select("d.s_id").distinct().where("s_id!=1").join(studentTableDF, "s_id")
frame12.show()

+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  07|    郑竹|1989-07-01||
|  05|    周梅|1991-12-01||
|  03|    孙风|1990-05-20||
|  02|    钱电|1990-12-21||
|  06|    吴兰|1992-03-01||
|  04|    李云|1990-08-06||
+----+------+----------+-----+

13、查询和"01"号的同学学习的课程完全相同的其他同学的信息

val frame13: DataFrame = scoreTableDF.where("s_id=1").as("s1").join(scoreTableDF.as("s2"), "c_id").groupBy("s2.s_id").count().as("s3").where(s"count=${scoreTableDF.where("s_id=1").count()} and s_id!=1").join(studentTableDF, "s_id")
frame13.show()

+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  03|    3|    孙风|1990-05-20||
|  02|    3|    钱电|1990-12-21||
|  04|    3|    李云|1990-08-06||
+----+-----+------+----------+-----+

14、查询没学过"张三"老师讲授的任一门课程的学生姓名

val frame14: Dataset[Row] = studentTableDF.join(teacherTableDF.where("t_name='张三'").join(courseTableDF, "t_id").join(scoreTableDF, Seq("c_id"), "left_outer"),Seq("s_id"),"left_outer").as("s1").where("s1.t_id is null")
frame14.show()

+----+------+----------+-----+----+----+------+------+-------+
|s_id|s_name|   s_birth|s_sex|c_id|t_id|t_name|c_name|s_score|
+----+------+----------+-----+----+----+------+------+-------+
|  08|    王菊|1990-01-20||null|null|  null|  null|   null|
|  06|    吴兰|1992-03-01||null|null|  null|  null|   null|
+----+------+----------+-----+----+----+------+------+-------+

15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩

val frame15: DataFrame = scoreTableDF.where("s_score<60").groupBy("s_id").count().where("count>=2").join(scoreTableDF,"s_id").groupBy("s_id").avg("s_score").join(studentTableDF,"s_id")
frame15.show()

+----+------------------+------+----------+-----+
|s_id|      avg(s_score)|s_name|   s_birth|s_sex|
+----+------------------+------+----------+-----+
|  06|              32.5|    吴兰|1992-03-01||
|  04|33.333333333333336|    李云|1990-08-06||
+----+------------------+------+----------+-----+

16、检索"01"课程分数小于60,按分数降序排列的学生信息

val frame16: Dataset[Row] = scoreTableDF.where("c_id=01 and s_score<60").join(studentTableDF,"s_id").orderBy($"s_score".desc)
frame16.show()

+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+------+----------+-----+
|  04|  01|     50|    李云|1990-08-06||
|  06|  01|     31|    吴兰|1992-03-01||
+----+----+-------+------+----------+-----+

17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩

val frame17: Dataset[Row] = scoreTableDF.join(scoreTableDF.groupBy("s_id").avg("s_score"),Seq("s_id"),"left_outer").join(studentTableDF,"s_id").orderBy($"avg(s_score)".desc)
frame17.show()

+----+----+-------+------------------+------+----------+-----+
|s_id|c_id|s_score|      avg(s_score)|s_name|   s_birth|s_sex|
+----+----+-------+------------------+------+----------+-----+
|  07|  02|     89|              93.5|    郑竹|1989-07-01||
|  07|  03|     98|              93.5|    郑竹|1989-07-01||
|  01|  01|     80| 89.66666666666667|    赵雷|1990-01-01||
|  01|  03|     99| 89.66666666666667|    赵雷|1990-01-01||
|  01|  02|     90| 89.66666666666667|    赵雷|1990-01-01||
|  05|  02|     87|              81.5|    周梅|1991-12-01||
|  05|  01|     76|              81.5|    周梅|1991-12-01||
|  03|  01|     80|              80.0|    孙风|1990-05-20||
|  03|  02|     80|              80.0|    孙风|1990-05-20||
|  03|  03|     80|              80.0|    孙风|1990-05-20||
|  02|  03|     80|              70.0|    钱电|1990-12-21||
|  02|  02|     60|              70.0|    钱电|1990-12-21||
|  02|  01|     70|              70.0|    钱电|1990-12-21||
|  04|  01|     50|33.333333333333336|    李云|1990-08-06||
|  04|  02|     30|33.333333333333336|    李云|1990-08-06||
|  04|  03|     20|33.333333333333336|    李云|1990-08-06||
|  06|  01|     31|              32.5|    吴兰|1992-03-01||
|  06|  03|     34|              32.5|    吴兰|1992-03-01||
+----+----+-------+------------------+------+----------+-----+

18、查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name,最高分,最低分,平均分,及格率,中等率,优良率,优秀率

val jige = scoreTableDF.rdd.map(x=>{if(x.getAs("s_score").toString.toInt > 60) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("c_id","jige")
val zhongdeng = scoreTableDF.rdd.map(x=>{if(x.getAs("s_score").toString.toInt > 70) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("c_id","zhongdeng")
val youliang = scoreTableDF.rdd.map(x=>{if(x.getAs("s_score").toString.toInt > 80) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("c_id","youliang")
val youxiu = scoreTableDF.rdd.map(x=>{if(x.getAs("s_score").toString.toInt > 90) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("c_id","youxiu")
val s1 = scoreTableDF.groupBy("c_id").agg("s_score"->"max","s_score"->"min","s_score"->"avg","s_score"->"count")
val frame18: DataFrame = s1.join(jige,"c_id").join(zhongdeng,"c_id").join(youliang,"c_id").join(youxiu,"c_id").withColumn("jgl",$"jige"/$"count(s_score)").withColumn("zdl",$"zhongdeng"/$"count(s_score)").withColumn("yll",$"youliang"/$"count(s_score)").withColumn("yxl",$"youxiu"/$"count(s_score)").drop("jige","zhongdeng","youliang","youxiu")
frame18.show()

+----+------------+------------+-----------------+--------------+------------------+------------------+------------------+------------------+
|c_id|max(s_score)|min(s_score)|     avg(s_score)|count(s_score)|               jgl|               zdl|               yll|               yxl|
+----+------------+------------+-----------------+--------------+------------------+------------------+------------------+------------------+
|  01|          80|          31|             64.5|             6|0.6666666666666666|               0.5|               0.0|               0.0|
|  03|          99|          20|             68.5|             6|0.6666666666666666|0.6666666666666666|0.3333333333333333|0.3333333333333333|
|  02|          90|          30|72.66666666666667|             6|0.6666666666666666|0.6666666666666666|               0.5|               0.0|
+----+------------+------------+-----------------+--------------+------------------+------------------+------------------+------------------+

19、按各科成绩进行排序,并显示排名

val frame19: DataFrame = scoreTableDF.selectExpr("*","row_number() over(partition by c_id order by s_score desc)")
frame19.show()

+----+----+-------+---------------------------------------------------------------------------------------+
|s_id|c_id|s_score|row_number() OVER (PARTITION BY c_id ORDER BY s_score DESC NULLS LAST UnspecifiedFrame)|
+----+----+-------+---------------------------------------------------------------------------------------+
|  01|  01|     80|                                                                                      1|
|  03|  01|     80|                                                                                      2|
|  05|  01|     76|                                                                                      3|
|  02|  01|     70|                                                                                      4|
|  04|  01|     50|                                                                                      5|
|  06|  01|     31|                                                                                      6|
|  01|  03|     99|                                                                                      1|
|  07|  03|     98|                                                                                      2|
|  02|  03|     80|                                                                                      3|
|  03|  03|     80|                                                                                      4|
|  06|  03|     34|                                                                                      5|
|  04|  03|     20|                                                                                      6|
|  01|  02|     90|                                                                                      1|
|  07|  02|     89|                                                                                      2|
|  05|  02|     87|                                                                                      3|
|  03|  02|     80|                                                                                      4|
|  02|  02|     60|                                                                                      5|
|  04|  02|     30|                                                                                      6|
+----+----+-------+---------------------------------------------------------------------------------------+

20、查询学生的总成绩并进行排名

val frame20: DataFrame = scoreTableDF.selectExpr("*","sum(s_score) over(partition by s_id) as sum_score").dropDuplicates("s_id").selectExpr("*","row_number() over(order by sum_score desc)")
frame20.show()

+----+----+-------+---------+-----------------------------------------------------------------------+
|s_id|c_id|s_score|sum_score|row_number() OVER (ORDER BY sum_score DESC NULLS LAST UnspecifiedFrame)|
+----+----+-------+---------+-----------------------------------------------------------------------+
|  01|  01|     80|      269|                                                                      1|
|  03|  01|     80|      240|                                                                      2|
|  02|  01|     70|      210|                                                                      3|
|  07|  02|     89|      187|                                                                      4|
|  05|  01|     76|      163|                                                                      5|
|  04|  01|     50|      100|                                                                      6|
|  06|  01|     31|       65|                                                                      7|
+----+----+-------+---------+-----------------------------------------------------------------------+

21、查询不同老师所教不同课程平均分从高到低显示

val frame21: Dataset[Row] = scoreTableDF.join(courseTableDF,"c_id").join(teacherTableDF,"t_id").groupBy("t_id","c_id").avg("s_score").orderBy($"avg(s_score)".desc)
frame21.show()

+----+----+-----------------+
|t_id|c_id|     avg(s_score)|
+----+----+-----------------+
|  01|  02|72.66666666666667|
|  03|  03|             68.5|
|  02|  01|             64.5|
+----+----+-----------------+

22、查询所有课程的成绩第2名到第3名的学生信息及该课程成绩

val frame22: DataFrame = scoreTableDF.selectExpr("*","row_number() over(partition by c_id order by s_score desc) num").where("num between 2 and 3").join(studentTableDF,"s_id")
frame22.show()

+----+----+-------+---+------+----------+-----+
|s_id|c_id|s_score|num|s_name|   s_birth|s_sex|
+----+----+-------+---+------+----------+-----+
|  07|  03|     98|  2|    郑竹|1989-07-01||
|  07|  02|     89|  2|    郑竹|1989-07-01||
|  05|  01|     76|  3|    周梅|1991-12-01||
|  05|  02|     87|  3|    周梅|1991-12-01||
|  03|  01|     80|  2|    孙风|1990-05-20||
|  02|  03|     80|  3|    钱电|1990-12-21||
+----+----+-------+---+------+----------+-----+

23、统计各科成绩各分数段人数:课程编号,课程名称,[100-85],[85-70],[70-60],[0-60]及所占百分比

val fenduan = scoreTableDF.rdd.map(x=>{
       if(x.getAs("s_score").toString.toInt < 60) (x(1).toString,1)
       else if(x.getAs("s_score").toString.toInt < 70) (x(1).toString,2)
       else if(x.getAs("s_score").toString.toInt < 85) (x(1).toString,3)
       else (x(1).toString,4)
       }).toDF("c_id","fenduan")
val frame23: DataFrame = fenduan.groupBy("c_id").count.as("f1").join(fenduan.groupBy("c_id","fenduan").count.as("f2"),"c_id").withColumn("rate",$"f2.count"/$"f1.count").drop($"f1.count").join(courseTableDF,"c_id")
frame23.show()

+----+-------+-----+-------------------+------+----+
|c_id|fenduan|count|               rate|c_name|t_id|
+----+-------+-----+-------------------+------+----+
|  01|      3|    4| 0.6666666666666666|    语文|  02|
|  01|      1|    2| 0.3333333333333333|    语文|  02|
|  03|      3|    2| 0.3333333333333333|    英语|  03|
|  03|      1|    2| 0.3333333333333333|    英语|  03|
|  03|      4|    2| 0.3333333333333333|    英语|  03|
|  02|      2|    1|0.16666666666666666|    数学|  01|
|  02|      4|    3|                0.5|    数学|  01|
|  02|      1|    1|0.16666666666666666|    数学|  01|
|  02|      3|    1|0.16666666666666666|    数学|  01|
+----+-------+-----+-------------------+------+----+

24、查询学生平均成绩及其名次

val frame24: DataFrame = scoreTableDF.groupBy("s_id").avg("s_score").selectExpr("*",s"row_number() over(order by 'avg(s_score)')")
frame24.show()

+----+------------------+--------------------------------------------------------------------------+
|s_id|      avg(s_score)|row_number() OVER (ORDER BY avg(s_score) ASC NULLS FIRST UnspecifiedFrame)|
+----+------------------+--------------------------------------------------------------------------+
|  07|              93.5|                                                                         1|
|  01| 89.66666666666667|                                                                         2|
|  05|              81.5|                                                                         3|
|  03|              80.0|                                                                         4|
|  02|              70.0|                                                                         5|
|  06|              32.5|                                                                         6|
|  04|33.333333333333336|                                                                         7|
+----+------------------+--------------------------------------------------------------------------+

25、查询各科成绩前三名的记录

val frame25: Dataset[Row] = scoreTableDF.selectExpr("*","row_number() over(partition by c_id order by s_score desc) num").where("num<=3")
frame25.show()

+----+----+-------+---+
|s_id|c_id|s_score|num|
+----+----+-------+---+
|  01|  01|     80|  1|
|  03|  01|     80|  2|
|  05|  01|     76|  3|
|  01|  03|     99|  1|
|  07|  03|     98|  2|
|  02|  03|     80|  3|
|  01|  02|     90|  1|
|  07|  02|     89|  2|
|  05|  02|     87|  3|
+----+----+-------+---+

26、查询每门课程被选修的学生数

val frame26: DataFrame = scoreTableDF.groupBy("c_id").count()
frame26.show()

+----+-----+
|c_id|count|
+----+-----+
|  01|    6|
|  03|    6|
|  02|    6|
+----+-----+

27、查询出只有两门课程的全部学生的学号和姓名

val frame27: DataFrame = scoreTableDF.groupBy("s_id").count().where("count=2").join(studentTableDF,"s_id")
frame27.show()

+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  07|    2|    郑竹|1989-07-01||
|  05|    2|    周梅|1991-12-01||
|  06|    2|    吴兰|1992-03-01||
+----+-----+------+----------+-----+

28、查询男生、女生人数

val frame28: DataFrame = studentTableDF.groupBy("s_sex").count()
frame28.show()

+-----+-----+
|s_sex|count|
+-----+-----+
||    4|
||    4|
+-----+-----+

29、查询名字中含有"风"字的学生信息

val frame29: Dataset[Row] = studentTableDF.where("s_name like '%风%'")
frame29.show()

+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  03|    孙风|1990-05-20||
+----+------+----------+-----+

30、查询同名同姓学生名单,并统计同名人数

val frame30: Dataset[Row] = studentTableDF.groupBy("s_name").count().where("count>1")
frame30.show()

+------+-----+
|s_name|count|
+------+-----+
+------+-----+

31、查询1990年出生的学生名单

val frame31: Dataset[Row] = studentTableDF.where("year(s_birth)=1990")
frame31.show()

+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  01|    赵雷|1990-01-01||
|  02|    钱电|1990-12-21||
|  03|    孙风|1990-05-20||
|  04|    李云|1990-08-06||
|  08|    王菊|1990-01-20||
+----+------+----------+-----+

32、查询每门课程的平均成绩,结果按平均成绩降序排列,平均成绩相同时,按课程编号升序排列

val frame32: Dataset[Row] = scoreTableDF.groupBy("c_id").avg("s_score").orderBy($"avg(s_score)".desc,$"c_id")
frame32.show()

+----+-----------------+
|c_id|     avg(s_score)|
+----+-----------------+
|  02|72.66666666666667|
|  03|             68.5|
|  01|             64.5|
+----+-----------------+

33、查询平均成绩大于等于85的所有学生的学号、姓名和平均成绩

val frame33: DataFrame = scoreTableDF.groupBy("s_id").avg("s_score").where("avg(s_score)>=85").join(studentTableDF,"s_id")
frame33.show()

+----+-----------------+------+----------+-----+
|s_id|     avg(s_score)|s_name|   s_birth|s_sex|
+----+-----------------+------+----------+-----+
|  07|             93.5|    郑竹|1989-07-01||
|  01|89.66666666666667|    赵雷|1990-01-01||
+----+-----------------+------+----------+-----+

34、查询课程名称为"数学",且分数低于60的学生姓名和分数

val frame34: DataFrame = scoreTableDF.where("s_score<60").join(courseTableDF,"c_id").where("c_name='数学'").join(studentTableDF,"s_id")
frame34.show()

+----+----+-------+------+----+------+----------+-----+
|s_id|c_id|s_score|c_name|t_id|s_name|   s_birth|s_sex|
+----+----+-------+------+----+------+----------+-----+
|  04|  02|     30|    数学|  01|    李云|1990-08-06||
+----+----+-------+------+----+------+----------+-----+

35、查询所有学生的课程及分数情况

val frame35: DataFrame = studentTableDF.join(scoreTableDF,Seq("s_id"),"left_outer")
frame35.show()

+----+------+----------+-----+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|
+----+------+----------+-----+----+-------+
|  07|    郑竹|1989-07-01||  02|     89|
|  07|    郑竹|1989-07-01||  03|     98|
|  01|    赵雷|1990-01-01||  01|     80|
|  01|    赵雷|1990-01-01||  02|     90|
|  01|    赵雷|1990-01-01||  03|     99|
|  05|    周梅|1991-12-01||  01|     76|
|  05|    周梅|1991-12-01||  02|     87|
|  08|    王菊|1990-01-20||null|   null|
|  03|    孙风|1990-05-20||  01|     80|
|  03|    孙风|1990-05-20||  02|     80|
|  03|    孙风|1990-05-20||  03|     80|
|  02|    钱电|1990-12-21||  01|     70|
|  02|    钱电|1990-12-21||  02|     60|
|  02|    钱电|1990-12-21||  03|     80|
|  06|    吴兰|1992-03-01||  01|     31|
|  06|    吴兰|1992-03-01||  03|     34|
|  04|    李云|1990-08-06||  01|     50|
|  04|    李云|1990-08-06||  02|     30|
|  04|    李云|1990-08-06||  03|     20|
+----+------+----------+-----+----+-------+

36、查询任何一门课程成绩在70分以上的学生姓名、课程名称和分数

val frame36: DataFrame = scoreTableDF.where("s_score>70").join(studentTableDF,"s_id").join(courseTableDF,"c_id")
frame36.show()

+----+----+-------+------+----------+-----+------+----+
|c_id|s_id|s_score|s_name|   s_birth|s_sex|c_name|t_id|
+----+----+-------+------+----------+-----+------+----+
|  01|  01|     80|    赵雷|1990-01-01||    语文|  02|
|  01|  05|     76|    周梅|1991-12-01||    语文|  02|
|  01|  03|     80|    孙风|1990-05-20||    语文|  02|
|  03|  07|     98|    郑竹|1989-07-01||    英语|  03|
|  03|  01|     99|    赵雷|1990-01-01||    英语|  03|
|  03|  03|     80|    孙风|1990-05-20||    英语|  03|
|  03|  02|     80|    钱电|1990-12-21||    英语|  03|
|  02|  07|     89|    郑竹|1989-07-01||    数学|  01|
|  02|  01|     90|    赵雷|1990-01-01||    数学|  01|
|  02|  05|     87|    周梅|1991-12-01||    数学|  01|
|  02|  03|     80|    孙风|1990-05-20||    数学|  01|
+----+----+-------+------+----------+-----+------+----+

37、查询课程不及格的学生

val frame37: DataFrame = scoreTableDF.where("s_score<60").join(studentTableDF,"s_id")
frame37.show()

+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+------+----------+-----+
|  06|  01|     31|    吴兰|1992-03-01||
|  06|  03|     34|    吴兰|1992-03-01||
|  04|  01|     50|    李云|1990-08-06||
|  04|  02|     30|    李云|1990-08-06||
|  04|  03|     20|    李云|1990-08-06||
+----+----+-------+------+----------+-----+

38、查询课程编号为01且课程成绩在80分以上的学生的学号和姓名

val frame38: DataFrame = scoreTableDF.where("c_id=01 and s_score>80").join(studentTableDF,"s_id")
frame38.show()

+----+----+-------+------+-------+-----+
|s_id|c_id|s_score|s_name|s_birth|s_sex|
+----+----+-------+------+-------+-----+
+----+----+-------+------+-------+-----+

39、求每门课程的学生人数

val frame39: DataFrame = scoreTableDF.groupBy("c_id").count()
frame39.show()

+----+-----+
|c_id|count|
+----+-----+
|  01|    6|
|  03|    6|
|  02|    6|
+----+-----+

40、查询选修"张三"老师所授课程的学生中,成绩最高的学生信息及其成绩

val frame40: Dataset[Row] = scoreTableDF.join(studentTableDF,"s_id").join(courseTableDF,"c_id").join(teacherTableDF,"t_id").where("t_name='张三'").join(studentTableDF,"s_id").selectExpr("*","max(s_score) over() max").where("max=s_score")
frame40.show()

+----+----+----+-------+------+----------+-----+------+------+------+----------+-----+---+
|s_id|t_id|c_id|s_score|s_name|   s_birth|s_sex|c_name|t_name|s_name|   s_birth|s_sex|max|
+----+----+----+-------+------+----------+-----+------+------+------+----------+-----+---+
|  01|  01|  02|     90|    赵雷|1990-01-01||    数学|    张三|    赵雷|1990-01-01|| 90|
+----+----+----+-------+------+----------+-----+------+------+------+----------+-----+---+

41、查询不同课程成绩相同的学生的学生编号、课程编号、学生成绩

val frame41: Dataset[Row] = scoreTableDF.as("s1").join(scoreTableDF.as("s2"),"s_id").where("s1.s_score=s2.s_score and s1.c_id!=s2.c_id")
frame41.show()

+----+----+-------+----+-------+
|s_id|c_id|s_score|c_id|s_score|
+----+----+-------+----+-------+
|  03|  01|     80|  02|     80|
|  03|  01|     80|  03|     80|
|  03|  02|     80|  01|     80|
|  03|  02|     80|  03|     80|
|  03|  03|     80|  01|     80|
|  03|  03|     80|  02|     80|
+----+----+-------+----+-------+

42、查询每门课程成绩最好的前三名

val frame42: Dataset[Row] = scoreTableDF.selectExpr("*","row_number() over(partition by c_id order by s_score desc)rank").where("rank<=3")
frame42.show()

+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
|  01|  01|     80|   1|
|  03|  01|     80|   2|
|  05|  01|     76|   3|
|  01|  03|     99|   1|
|  07|  03|     98|   2|
|  02|  03|     80|   3|
|  01|  02|     90|   1|
|  07|  02|     89|   2|
|  05|  02|     87|   3|
+----+----+-------+----

43、统计每门课程的学生选修人数(超过5人的课程才统计)要求输出课程号和选修人数,查询结果按人数降序排列,若人数相同,按课程号升序排列

val frame43: Dataset[Row] = scoreTableDF.groupBy("c_id").count().where("count>=5").orderBy($"count".desc).orderBy("c_id")
frame43.show()

+----+-----+
|c_id|count|
+----+-----+
|  01|    6|
|  02|    6|
|  03|    6|
+----+-----+

44、检索至少选修两门课程的学生学号

val frame44: DataFrame = scoreTableDF.groupBy("s_id").count().where("count>2").drop("count")
frame44.show()

+----+
|s_id|
+----+
|  01|
|  03|
|  02|
|  04|
+----+

45、查询选修了全部课程的学生信息

val frame45: DataFrame = studentTableDF.join(scoreTableDF,Seq("s_id"),"left_outer").groupBy("s_id").count().where(s"count=${courseTableDF.select("c_id").count() }").join(studentTableDF,"s_id")
frame45.show()

+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  01|    3|    赵雷|1990-01-01||
|  03|    3|    孙风|1990-05-20||
|  02|    3|    钱电|1990-12-21||
|  04|    3|    李云|1990-08-06||
+----+-----+------+----------+-----+

46、查询各学生的年龄(周岁)

val frame46: DataFrame = studentTableDF.selectExpr("*","year(current_date)-year(s_birth)")
frame46.show()

+----+------+----------+-----+--------------------------------------+
|s_id|s_name|   s_birth|s_sex|(year(current_date()) - year(s_birth))|
+----+------+----------+-----+--------------------------------------+
|  01|    赵雷|1990-01-01||                                    30|
|  02|    钱电|1990-12-21||                                    30|
|  03|    孙风|1990-05-20||                                    30|
|  04|    李云|1990-08-06||                                    30|
|  05|    周梅|1991-12-01||                                    29|
|  06|    吴兰|1992-03-01||                                    28|
|  07|    郑竹|1989-07-01||                                    31|
|  08|    王菊|1990-01-20||                                    30|
+----+------+----------+-----+--------------------------------------+

47、查询本周过生日的学生:找到下周一-1即为本周最后一天,开始时间为当前天(若今天就是星期天会不会出错?)

val frame47: Dataset[Row] = studentTableDF.where("unix_timestamp(cast(concat_ws('-',date_format(current_date(),'yyyy'),date_format(s_birth,'MM'),date_format(s_birth,'dd'))as date),'yyyy-MM-dd') between unix_timestamp(current_date()) and unix_timestamp(date_sub(next_day(current_date(),'MON'),1),'yyyy-MM-dd')")
frame47.show()

+----+------+-------+-----+
|s_id|s_name|s_birth|s_sex|
+----+------+-------+-----+
+----+------+-------+-----+

48、查询下周过生日的学生: 下周1到+6天

val frame48: Dataset[Row] = studentTableDF.where(" unix_timestamp( cast( concat_ws('-',date_format(current_date(),'yyyy'),date_format(s_birth,'MM'),date_format(s_birth,'dd') ) as date ),'yyyy-MM-dd') between unix_timestamp(date_sub(next_day(current_date(),'MON'),1),'yyyy-MM-dd') and unix_timestamp(date_add(next_day(current_date(),'MON'),6),'yyyy-MM-dd') ")
frame48.show()

+----+------+-------+-----+
|s_id|s_name|s_birth|s_sex|
+----+------+-------+-----+
+----+------+-------+-----+

49、查询本月过生日的学生

val frame49: Dataset[Row] = studentTableDF.where("month(s_birth)=month(current_date)")
frame49.show()

+----+------+-------+-----+
|s_id|s_name|s_birth|s_sex|
+----+------+-------+-----+
+----+------+-------+-----+

50、查询12月份过生日的学生

val frame50: Dataset[Row] = studentTableDF.where("month(s_birth)=12")
frame50.show()

+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  02|    钱电|1990-12-21||
|  05|    周梅|1991-12-01||
+----+------+----------+-----+
  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值