【面试复习】【数据库与SQL的常考知识点】

最新推荐文章于 2020-11-26 04:40:27 发布

zyuPp

最新推荐文章于 2020-11-26 04:40:27 发布

阅读量608

点赞数

分类专栏： SQL基础知识

本文链接：https://blog.csdn.net/zyuPp/article/details/105483432

版权

SQL基础知识专栏收录该内容

3 篇文章 0 订阅

订阅专栏

借鉴的文章：
数据分析面试必备

SQL语句

1、从表中选取数据

select id from table

2、多张表取多个字段（表连接）

table1有id和name和table2有id和age，取出id，name，age，以id为主键连接

select a.id,a.name,b.age from 
(select id,name from table1) as a --将select之后的内容存为临时表a
join
(select id,age from table2) as b
on a.id = b.id

3、几种JOIN的用法和区别

inner join ：找出左右表都可匹配的记录
left join：以左表为准，在右表中找可匹配字段，若无，填NULL
right join：以右表为准，去左表找可匹配字段，若无，填NULL
full outer join：全连接，包含两表连接结果，若任何一边有缺失，填NULL【mysql没有这个功能，需要左连接联合右连接来达到这个效果】
on：以某个字段连接表

4、不去重地合并两张表的数据

select * from 
(select id from table1 UNION ALL select id from table2)
t;

UNION ALL会基于列合并多张表的数据，列格式要一致。
UNION会去重并降低效率，UNION ALL会直接追加数据

5、常用函数与关键字用法

以表格为例，完成以下问题：
在这里插入图片描述

想知道去重后的用户数：

-- 统计不同id的个数
select count(distinct id) from table1;

-- 优化：
select count(*) from (select distinct id from table1) tb;

使用count distinct进行去重统计会将reducer数量强制限定为1，影响效率，所以改写为子查询会更好。这里要去了解一下什么叫reducer

统计以下男女性别的数量

select count(distinct id) from table1
group by sex;

GROUP BY就是以某个字段划分小组
聚合函数帮助我们进行基本的数据统计，如计算max、min、avg、sum等等。下面是统计年龄的max、min、avg

select max(age),min(age),avg(age) from table1
group by id;

查看A公司的性别人数统计

select sex,count(distinct id) from table1
where company='A'
group by sex;

统计各个公司男性平均年龄，且仅保留平均年龄30岁以上的公司

select company,avg(age) from table1
where sex='M'
group by company
having avg(age)>30;

HAVING与groupby搭配使用，对分组做过滤。

倒序查看年龄最老的10个人

select id,age from table1 order by age DESC
limit 10;

把数值类型变量转换成类别类型变量，CASE WHEN函数

case函数：
case when 条件1 then value1 else null end

else可以被省略，但是end不可以！！！
CAST函数，用于string/int/double类型转换

select id,
(case when CAST(salary as float)<50000 Then '0-5万'
when CAST(salary as float)>=50000 and CAST(salary as float)<100000 then '5-10万'
when CAST(salary as float) >=100000 and CAST(salary as float)<200000 then '10-20万'
when CAST(salary as float)>200000 then '20万以上'
else NULL end)
from table1;

6、字符串使用

concat可以把两个字符串连接在一起

select concat("hello","world","!") from table1;

split(str,regex)按照regex分割str，并换成数组

select split("a,b,c",",") as value_array from table1;

-- 结合array index,将原始字符串分割为3列
select value_array[0],value_array[1],value_array[2] from 
(select  split("1,2,3",",")as value_array from table1 ) t

substr(str,x,len)截取字符串从x位开始长度为len个字符，注意这里是直接从1开始数，不是0

select substr('abcde',3,2) from table1;

7.分组排序（开窗函数）

首先是按照某个字段排序，并且获得排名（查出指定条件后的进行排名）

-- 按照字段salary倒序编号
select *, row_number() over (order by salary desc) as row_num from table1;

按照不同的公司分组，然后对分组进行salary的排序，并获得排名

select *, row_number() over (partition by company order by salary desc) as rank from table1;

搭配使用：row_number() over（partition by … order by … desc）

row_number()：得到的排名是按顺序排下去，排序的值相同但排名不重复，1234
rank() ：排序的值相同，排名的值也重复相同，总数不变，即有5个排名，出现两个第三名，最后一个还是写第五名。就是1,2,3,3,5这种。
dense_rank() ：与rank差不多，但是总数会减少，即最后一个会写第四名,1,2,3,3,4

8、获得top10%

-- 获取income字段的top10%的阈值
select percentile(CAST (salary AS int),0.9)) as income_top10p_threshold from table_1;

-- 获取income字段的10个百分位点
select percentile(CAST (salary AS int),array(0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0)) as income_percentiles
from table_1;

9、时间函数

-- 转换为时间数据的格式
select to_date("1970-01-01 00:00:00") as start_time from table_1;

-- 计算数据到当前时间的天数差
 select datediff('2016-12-30','2016-12-29');
-- 得到 "1"

to_date函数可以把时间的字符串形式转化为时间类型，再进行后续的计算；

常用的日期提取函数包括 year()/month()/day()/hour()/minute()/second()

datediff是计算两个时间的日数相差

date_sub(stratdate,INTERVAL 数字 DAY) 返回开始日期startdate减少days天后的日期。

date_add(stratdate,INTERVAL 数字 DAY) 返回开始日期startdate增加days天后的日期。

INTERVAL是间隔的意思，后面数字自己填，DAY可以换成YEAR_MONTH，YEAR、second等

10、经典例题

例：有3个表S，C，SC：

S（SNO，SNAME）代表（学号，姓名）

C（CNO，CNAME，CTEACHER）代表（课号，课名，教师）

SC（SNO，CNO，SCGRADE）代表（学号，课号，成绩）

问题：

1，找出没选过“黎明”老师的所有学生姓名。

2，列出2门以上（含2门）不及格学生姓名及平均成绩。

3，既学过1号课程又学过2号课所有学生的姓名。

1. -- 考察条件筛选
select sname from s where sno not in
( select sno from sc where cno in
  (
select distinct cno from c where cteacher='黎明'
  )
);
2. -- 考察聚合函数，条件筛选
select s.sname, avg_grade from s
join
(select sno from sc where scgrade < 60 group by sno having count(*) >= 2) t1
on s.sno = t1.sno
join
(select sno, avg(scgrade) as avg_grade from sc group by sno ) t2
on s.sno = t2.sno;

3. -- 考察筛选、连接
select sname 
from s
where sno=
(
select a.sno 
from
( select sno from sc where cno = 1) a
join 
(select sno from sc where cno = 2) b
on a.sno = b.sno
)

关于select count(*)和select count(1)的区别：
一般情况下，Select Count (*)和Select Count(1)两着返回结果是一样的。

假如表没有主键(Primary key), 那么count(1)比count(*)快，如果有主键的话，那主键作为count的条件时候count(主键)最快，如果你的表只有一个字段的话,那count(*)就是最快的。count(*) 跟 count(1) 的结果一样，都包括对NULL的统计，而count(column) 是不包括NULL的统计。