今天也要加油鸭~
- 这是一份总结的学习笔记
- 路漫漫其修远兮,吾将上下而求索
- 可阅读可评论可分享可转载,希望向优秀的人学习
今天你刷SQL了吗?
害,坚持才是胜利。。。
1. 查询同一用户的最后一次的消费记录
Select *
from 表 t
where not exist
(select 1
from 表
where t.ID = ID and time >t.time)
2. 窗口函数–非常重要
窗口函数可以进行排序,生成序列号等一般的聚合函数无法实现的高级操作。窗口函数的适用范围:只能在select子句中使用。
窗口函数大体可以分为以下两种:
- 能够作为窗口函数的聚合函数(sum,avg,count,max,min)
- rank,dense_rank。row_number等专用窗口函数。
1.rank函数:计算排序时,如果存在相同位次的记录,则会跳过之后的位次。
2.dense_rank函数:同样是计算排序,即使存在相同位次的记录,也不会跳过之后的位次。
3.row_number函数:赋予唯一的连续位次。
语句:
select product_name
,product_type
,sale_price,
rank () over (partition by product_type order by sale_price) as ranking
dense_rank () over (order by sale_price) as dense_ranking,
row_number () over (order by sale_price) as row_num
万能模板:求TOPN问题,例子是求部门工资前三高
select DepartmentId,Name,Salary
from (
select *,
dense_rank() over (partition by DepartmentId
order by Salary desc) as ranking
from Employee) as a
where ranking <= 3;
计算移动平均:将表以窗口为单位进行分割,并在其中进行排序的函数。
select product_id, product_name, sale_price,
avg (sale_price) over (order by product_id
rows 2 preceding) as moving_avg
from Product
3. 求TOPN排名问题(多种思路)
(1)limit+offset
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
SET N := N-1;
RETURN (
# Write your MySQL query statement below.
SELECT
salary
FROM
employee
GROUP BY
salary
ORDER BY
salary DESC
LIMIT N, 1
);
END
(2)子查询
Select distinct e.salary
From employee e
Where (select count(distinct salary) from employee where salary >e.salary) = N
(3)连接查询
Select distinct e1.salary
From employee e1 join employee e2 on e1.salary <= e2.salary
Group by e1.salary
Having count(distinct e2.salary) = N
(4)笛卡尔积
Select distinct e1.salary
From employee e1, employee e2 where e1.salary <= e2.salary
Group by e1.salary
Having count(distinct e2.salary) = N
(5)自定义变量
Select distinct salary
From (select salary, @r:= if(@p=salary, @r, @r+1) as rank, @p:=salary
From employee ,(select @r:=0, @p:=NULL) init
Order by salary desc) t
Where rank = N
(6)窗口函数
Select distinct salary
From (select salary, dense_rank() over(order by salary desc) as rank from employee)t
Where rank = N
单表 》 连接 (尽量不用子查询、笛卡尔积)
4. SQL的数据类型
- 字符串:char、varchar、text
- 二进制串:binary、varbinary
- 布尔类型:boolean
- 数值类型:integer、smallint、bigint、decimal、numeric、float、real、double
- 时间类型:date、time、timestamp、interval
#每月的消耗总额
select year(ordertime) 年,
month(ordertime) 月,
sum(Total) 销售合计
from 订单表
group by year(ordertime),
month(ordertime)
将一表两列合并为一列:select CONVERT(char(5), id)+name from table
5. 累积型计算套路
计算不同日期下玩家玩的游戏个数(2017-06-25 共玩了 5+6+1=12 个游戏)
- MySQL 联合查询
SELECT
a1.player_id,
a1.event_date,
SUM(a2.games_played) AS games_played_so_far
FROM
Activity AS a1,
Activity AS a2
WHERE
a1.player_id = a2.player_id AND
a1.event_date >= a2.event_date
GROUP BY a1.player_id, a1.event_date
- MySQL 变量法–速度最快
select
a.player_id,
a.event_date,
case
when @prev = a.player_id then @cnt := @cnt + a.games_played
when (@prev := a.player_id) is not null then @cnt := a.games_played
end games_played_so_far
from
Activity a,
(select @cnt := 0,@prev := null
) t
order by
a.player_id,
a.event_date
- MSSQL 窗口函数–速度最慢
select
player_id,
event_date,
sum(games_played) over(partition by player_id order by event_date) games_played_so_far
from Activity
6. 中位数求法的思路
- 思路
对于一个 奇数 长度数组中的 中位数,大于这个数的数值个数等于小于这个数的数值个数。 - 算法:根据上述的定义,我们来找一下 [1, 3, 2] 中的中位数。首先 1 不是中位数,因为这个数组有三个元素,却有两个元素 (3,2) 大于 1。3 也不是中位数,因为有两个元素小于 3。对于最后一个 2 来说,大于 2 和 小于 2 的元素数量是相等的,因此 2 是当前数组的中位数。
当数组长度为偶数,且元素唯一时,中位数等于排序后 中间两个数 的平均值。对这两个数来说,大于当前数的数值个数跟小于当前数的数值个数绝对值之差为 1,恰好等于这个数出现的频率。总的来说,不管是数组长度是奇是偶,也不管元素是不是唯一,中位数出现的频率一定大于等于大于它的数和小于它的数的绝对值之差。这个规律是这道题的关键,可以通过下面这个搜索条件来过滤。
SELECT
Employee.Id, Employee.Company, Employee.Salary
FROM
Employee,
Employee alias
WHERE
Employee.Company = alias.Company
GROUP BY Employee.Company , Employee.Salary
HAVING SUM(CASE
WHEN Employee.Salary = alias.Salary THEN 1
ELSE 0
END) >= ABS(SUM(SIGN(Employee.Salary - alias.Salary)))
ORDER BY Employee.Id
#窗口函数:
SELECT Id, Company, Salary FROM
(SELECT Id, Company, Salary, COUNT(Salary) OVER (PARTITION BY Company) AS CN,
ROW_NUMBER() OVER (PARTITION BY Company ORDER BY Salary) AS RN FROM Employee) T
WHERE RN = (CN+1)/2 OR RN = (CN+2)/2
如果 n1.Number 为中位数,n1.Number(包含本身)前累计的数字应大于等于总数/2 同时n1.Number(不包含本身)前累计数字应小于等于总数/2。
SELECT
AVG(Number) median
FROM
(SELECT n1.Number FROM Numbers n1 JOIN Numbers n2 ON n1.Number>=n2.Number
GROUP BY n1.Number
HAVING
SUM(n2.Frequency)>=(SELECT SUM(Frequency) FROM Numbers)/2
AND
SUM(n2.Frequency)-AVG(n1.Frequency)<=(SELECT SUM(Frequency) FROM Numbers)/2
)s
7. 取整函数语法
- trunc(value,precision) 截取前几位。按精度(precision)截取某个数字,不进行舍入操作。
- round(value,precision) 保留小数位。根据给定的精度(precision)输入数值。
- ceil (value) 向上取整。产生大于或等于指定值(value)的最小整数。
- floor(value)与 ceil() 向下取整。产生小于或等于指定值(value)的最小整数。
- sign(value) 与绝对值函数ABS()相反。ABS()给出的是值的量而不是其符号,sign(value)则给出值的符号而不是量。
8. 找到连续区间的开始和结束数字
- MySQL 利用log_id-1不在区间内为最小起始值,log_id+1不在区间内为最大起始值
select a.log_id as start_id, b.log_id as end_id
from (select log_id from Logs where log_id-1 not in (select * from Logs)) a,
(select log_id from Logs where log_id+1 not in (select * from Logs)) b
where a.log_id <= b.log_id
group by a.log_id
- 定义两个变量@num,@id。@id用来存放log_id,@num用来标记log_id连续的记录
SELECT
min(log_id) start_id,
max(log_id) end_id
FROM
(select log_id,
case when @id = log_id -1
then @num := @num
else @num := @num + 1
end num,@id := log_id
from Logs,(select @num := 0, @id := NULL) a) x
GROUP BY num
- 利用连续排名减log_id的差分组(窗口函数)
select min(log_id) start_id, max(log_id) end_idfrom
(select log_id, row_number() over(order by log_id) - log_id as grp from logs) group by grp order by min(log_id)
9. 找出人流量的高峰期时,至少连续三行记录中的人流量不少于100。(id三种排序)
- 方法1:变量实现
(1)先给每一天打标记,是否超过100,是1否0
(2)连续同样标记的记为同一组(变量实现)
(3)按组别分类统计,筛选组记录数超过3的所有组对应的成员信息
select t4.id,t4.visit_date,t4.people
from
(select id,visit_date,people,
case when @t1=tag then @gid1 else @gid1:=@gid1+1 end gid,@t1:=tag
from
(select id,visit_date,people,if(people>=100,1,0) tag
from stadium)t5,(select @gid1:=0,@t1:=0)t6)t4
where t4.gid in (select distinct gid from
(select id,visit_date,people,
case when @t=tag then @gid else @gid:=@gid+1 end gid,@t:=tag
from
(select id,visit_date,people,if(people>=100,1,0) tag
from stadium)t1,(select @gid:=0,@t:=0)t2)t3
group by gid
having count(*)>=3)
- 方法2:
(1)筛选出所有大于100记录
(2)要求连续三天,做两次全连接,获取包含当前记录任意3天的超过100的记录
select distinct s1.
from stadium s1,stadium s2,stadium s3
where s1.people>=100 and s2.people>=100 and s3.people>=100
and (
(s1.id=s2.id+1 and s1.id=s3.id+2) or
(s1.id=s2.id+1 and s1.id=s3.id-1) or
(s1.id=s2.id-1 and s1.id=s3.id-2)
)
order by s1.id
10. 集合函数(union/union all)
集合函数使用规则:
① 每个集合要求列数量相同,列顺序相同。
② 每个集合显示的列,要求数据类型一致或者可隐式转换成同一数据类型
③ 最终集合列名称与第一个集合的列名称一致
集合函数作用:
① Union函数,合并多个结果集,且各结果集的重复行只保留一份,并按第一列升序显示
② Union all函数,合并多个结果集,并不去除重复行,也不排序
11. 返回员工最多的项目(>= all / order by …limit 1)
select project_id
from Project
group by project_id
having count(employee_id) >= all(select count(employee_id) from Project group by project_id)
select count(employee_id) from Project group by project_id= select count(*) over(partition by project_id)
12. 判断语句(case when/if)
- case … when … 判断是否是三角形
select x,y,z,
case when x+y>z and x+z>y and z+y>x then 'Yes' else 'No' end as triangle from triangle;
- if( , , )
select x,y,z,if(x+y>z and x+z>y and z+y>x,‘Yes’,‘No’) triangle from triangle;
计算资本损益是每次(’Buy’->‘Sell’)操作资本收益或损失的和
select stock_name, (sum(case when operation='Buy' then -price else price end)) as capital_gain_loss
from Stocks
group by stock_name
select stock_name, sum(if(operation='Buy',-price, price)) as capital_gain_loss
from Stocks
group by stock_name
13. 连续空余位置(2个以上)- 类比人流量最高峰
select distinct(c1.seat_id) as seat_id ---distinct很重要
from cinema c1, cinema c2
where abs(c1.seat_id - c2.seat_id) = 1 and c1.free = 1 and c2.free = 1
order by seat_id
拓展:用变量的方法,设置连续N位
select a.seat_id
from cinema a,
(select group_concat(new.seat_id) seat_ids
from (select t.seat_id, @id := @id+1 id, t.seat_id-@id con
from cinema t,(select @id := 0) init
where free = 1) new
group by new.con
having count(new.con) >= 2) b
查询FIND_IN_SET中strlist是否包含str,如果包含返回该数据,反之则返回null.
where FIND_IN_SET(a.seat_id, b.seat_ids)
14. 连接函数(concat/group_concat)
-
concat()函数
1、功能:将多个字符串连接成一个字符串。
2、语法:concat(str1, str2,…)
返回结果为连接参数产生的字符串,如果有任何一个参数为null,则返回值为null。
3、语法:concat(str1, seperator,str2,seperator,…)
返回结果为连接参数产生的字符串并且有分隔符,如果有任何一个参数为null,则返回值为null。 -
concat_ws()函数–在concat基础上指定分隔符
1、功能:和concat()一样,将多个字符串连接成一个字符串,但是可以一次性指定分隔符(concat_ws就是concat with separator)
2、语法:concat_ws(separator, str1, str2, …)
说明:第一个参数指定分隔符。需要注意的是分隔符不能为null,如果为null,则返回结果为null。 -
group_concat()函数
1、功能:将group by产生的同一个分组中的值连接起来,返回一个字符串结果。
2、语法:group_concat( [distinct] 要连接的字段 [order by 排序字段 asc/desc ] [separator ‘分隔符’] )
说明:通过使用distinct可以排除重复值;如果希望对结果中的值进行排序,可以使用order by子句;separator是一个字符串值,缺省为一个逗号
15. 查找函数(find_in_set/in/like)
FIND_IN_SET中strlist是否包含str,如果包含返回该数据,反之则返回null.
find_in_set与like的区别
like是广泛的模糊匹配,字符串中没有分隔符,find_in_set是精确匹配,字段值以英文“,”分隔,find_in_set查询的结果要小于like查询的结果。
find_in_set与in的区别
select * from table_name where ‘test’ in (list);
select * from table_name where find_in_set(‘test’,list);
所以如果list是常量,则直接用in;如果list是变量,则要使用find_in_set()函数。
16. sql中null与‘ ’的区别。
- null表示空,用is null判断
- '‘表示空字符串,用=’'判断
17. 算出每个app类别只安装了一个app的uid总数。有uid,app名称,app类别,数据百亿级别。
- 小数据量的话直接查询:
select b.apptype,count(b.uid)
from (select uid, apptype, count(appname) as num
from app_info
group by uid,apptype
having count(appname) = 1 )b
group by b.apptype
- 大数据量下(没用过hadoop不太清楚,望大家指正)
a. 原始文件可以拼接为uid-app-categroy
b. map阶段形成的<k,v>是<uid-category,1>
c. reduce阶段统计key为“uid-category”的count数量
d. 只保留count为1的数据
e. 剩下的数据量直接统计uid出现次数=category类别数的数据
18. 平均工资:部门和公司的比较
1.计算每个公司每个月的平均工资
2.计算每个部门每个月的平均工资
3.连接相同月份得到结果
注意:转换日期格式dateformat,平均值avg(),选择case when …then else end
select t1.pay_month pay_month, t2.department_id,
case when department_avg > company_avg then 'higher'
when department_avg < company_avg then 'lower'
else 'same' end comparison
from
(select avg(amount) as company_avg , date_format((pay_date),'%Y-%m') as pay_month
from salary
group by pay_month) t1,
(select department_id, avg(amount) as department_avg, date_format((pay_date),'%Y-%m') as pay_month
from salary s, employee e
where s.employee_id = e.employee_id
group by pay_month, department_id ) t2
where t1.pay_month = t2.pay_month
19. 实现数据透视表(Case when …group by/开窗函数)
- case when … group by
实现对大洲(continent)列的 透视表 操作,使得每个学生按照姓名的字母顺序依次排列在对应的大洲下面
select
max(case when continent = 'America' then name else null end) America,
max(case when continent = 'Asia' then name else null end) Asia,
max(case when continent = 'Europe' then name else null end) Europe
from
(select
name,
continent,
@rk := if(@pre = continent, @rk+1, 1) rk, @pre := continent
from student, (select @pre := null, @rk := 0) r1
order by continent, name
/替换为 row_number() over(partition by contient order by name) rk from student
)t
group by rk
- 开窗函数
现有一群人的信息,包括姓名、所在城市、年龄和工资,需查阅5000元以下的人的信息及数量。-- 聚合函数(不能得到汇总数量,因为聚合函数分组汇总了)
SELECT city, age, COUNT(*) AS num
FROM Table_A
WHERE salary < 5000
GROUP BY city, age;
- 开窗函数
SELECT city, age, COUNT(*) OVER() AS num -- OVER()中常添加选项以改变聚合运算的窗口范围
FROM Table_A
WHERE salary < 5000;
20. 交换数据
对比分析-从交换name变成交换id(修改id + 将id升序排列
条件判断语句case when ;如何判断奇数和偶数:mod函数
select (case
#当座位号是奇数且不是最后一个座位号时
when mod(id,2) = 1 and counts != id then id+1
# 当座位号是奇数并且是最后一个座位号时,座位号不变
when mod(id,2) = 1 and counts = id then id
# 当座位号是偶数时
else id-1 end) as id, student
from seat, (select count(*) as counts from seat) b
order by id asc
21. 每一个用户按日期顺序卖出的第二件商品的品牌是否是他们最喜爱的品牌。如果一个用户卖出少于两件商品,查询的结果是 no
1、orders根据seller_id自连接,根据order_id分组
2、使用sum()统计比当前订单日期更早的订单数,若为1,说明当前订单是对应用户卖出的第二单
3、根据item_id与items连接,将商品编号转化为品牌名
users左连接上述的子查询,判断是否为用户最喜爱的品牌 case when/if均可
select u.user_id seller_id ,
if(u.favorite_brand = t.item_brand ,'yes','no') as 2nd_item_fav_brand
from Users u
left join (select o1.seller_id, item_brand
from orders o1
join orders o2
on o1.seller_id = o2.seller_id
join Items i
on o1.item_id = i.item_id
group by o1.order_id
having sum(o1.order_date > o2.order_date) =1
) t
on u.user_id = t.seller_id