数据分析 知识体系 mysql篇

基本语法

select 
from 
join/left join/right join 
on
where 
group by 
having
order by  asc/desc
limit a,b(截断a行,输出b行)

增删查改

常见问题

1 各种连接的区别?

左连接(A left join B on…):两表关联,左表全部保留,右表关联不上用null表示。
右连接(A right join B on…):两表关联,右表全部保留,左表关联不上的用null表示。
内连接(A join B on…):两表关联,保留两表中交集的记录。
笛卡尔积(A,B / A join B):两表关联,把左表的列和右表的列通过笛卡尔积的形式表达出来。

2 查询顺序

from -> join -> on -> where -> group by(此处开始可以用别名) -> 聚合函数如AVG,SUM -> having -> select -> distinct -> order by -> limit

3 如何实现对一个字段升序,另一个字段降序?

字段1 asc,字段2 desc

4 SQL优化
详情见此

好题汇总

leetcode(按出现频率排列)

题目名字改成更直观的内容。

569. 员工薪水中位数

题目详情
此题中的中位数,指当行数n为偶的时候,同时输出n/2和n/2 + 1行,而不是求这两行的平均。

select id,company, salary
from 
(select id,company,salary,row_number() over(partition by company order by salary) as rn,
count(id) over (partition by company) as cnt 
from employee)t
where rn >= cnt/2 and rn <= cnt/2 +1

技巧点:
1.巧妙地将窗口函数和聚合函数结合起来。
2.求中位数的通用技巧 rn >= cnt/2 and rn <= cnt/2 +1

185. 部门工资前三高的所有员工

题目详情

select d.name as department , t.name as employee,t.salary
from 
(select name,salary,departmentid,dense_rank() over(partition by departmentid order by salary desc) as rn 
from employee) t join department d 
on d.id = t.departmentid
where rn <= 3

技巧点:
1.利用dense_rank窗口函数。

262. 每天的取消率

题目详情

解法1:

select request_at as Day,
round(sum(case when status <> 'completed' then 1 else 0 end )/count(*),2) as 'Cancellation Rate'
from trips 
where request_at between '2013-10-01' and '2013-10-03' and 
driver_id in 
(select users_id from users where banned = 'No' and Role = 'driver') and 
client_id in 
(select users_id from users where banned = 'No' and Role = 'client')
group by 1
order by 1

解题思路:
1.通过where函数筛选出时间和用户符合的记录
2.利用sum函数和case when函数的组合计算取消用户数

解法2:

SELECT
    request_at as 'Day', round(avg(Status!='completed'), 2) as 'Cancellation Rate'
FROM 
    trips t JOIN users u1 ON (t.client_id = u1.users_id AND u1.banned = 'No')
    JOIN users u2 ON (t.driver_id = u2.users_id AND u2.banned = 'No')
WHERE	
    request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY 
    request_at

解题思路:在筛选符合条件的用户上,此解法用了两次内连接,效率会比解法1的in好。在计算取消率上,此解法直接用avg函数,很巧妙。

1097. 每天的新用户数和1日留存率

题目详情

select t1.event_date as install_dt,count(t1.player_id) as installs,
round(count(t2.player_id)/count(t1.player_id),2) as Day1_retention
from 
(select player_id,event_date,row_number()over(partition by player_id order by event_date) as rn 
from activity) t1 left join activity t2 
on t1.player_id = t2.player_id and datediff(t2.event_date,t1.event_date) = 1 
where t1.rn = 1
group by 1

解题思路:
1.构造一张新表,记录每天的新用户id
2.与原表格左连接,连接方式为用户id相等和登录日期差1天。

579. 员工的累计薪水

题目详情

select id,month,salary 
from 
(select id,month,sum(salary) over(partition by id order by month rows 2 preceding) as salary,
row_number() over (partition by id order by month desc) as rn
from employee)t
where rn <> 1
order by id,month desc

技巧点:
1.窗口函数和sum函数结合
2. 聚合函数中的rows 2 preceding语法。

180. 连续出现的数字

题目详情

解法1:

select distinct num as ConsecutiveNums
from
(select num, cast(id as signed) - cast(rn as signed)
from 
(select id,num, row_number()over(partition by num) as rn 
from logs) t
group by 1,2
having count(*) >=3)t

技巧点:
1.连续问题一般可以转换为id - rn 问题
2.id - rn 为负数时,mysql会报错。因此要用cast函数将数据类型转换为有符号整数类型。

解法2:

select distinct t1.num as ConsecutiveNums 
from logs t1,logs t2,logs t3 
where t1.id = t2.id +1 and t2.id = t3.id + 1 
and t1.num = t2.num and t2.num = t3.num

解题思路:直接三表连接。

1205. 每月交易

题目详情

select month,country,
sum(case when state = 'approved' then 1 else 0 end) as approved_count,
sum(case when state = 'approved' then amount else 0 end) as approved_amount,
sum(case when state = 'chargeback' then 1 else 0 end) as chargeback_count,
sum(case when state = 'chargeback' then amount else 0 end) as chargeback_amount
from 
(select id,country,state,amount,date_format(trans_date,'%Y-%m') as month
from transactions 
where state = 'approved'
union all 
select trans_id as id,country, 'chargeback' as state,amount,
date_format(c.trans_date,'%Y-%m') as month
from chargebacks c left join transactions t 
on c.trans_id = t.id) t1
group by 1,2

解题思路:这是一道比较难理解题意的题目。
根据题意构造这样的新表:表格含批准状态交易的所有信息和退款状态交易的所有信息,利用union all 函数连接两类信息,state字段是用来区分两种状态的,相当于打了一个tag。

技巧点:
1.date_format(date,format)
2. sum函数和case语句结合,需要熟练掌握

550. 1日留存

题目详情

select round(count(distinct a.player_id)/count(t.player_id),2) as fraction
from 
(select player_id,min(event_date) as event_date 
from activity 
group by 1)t left join activity a 
on t.player_id = a.player_id and datediff(a.event_date,t.event_date) = 1

解题思路:1097题的简单版,依旧构造一张新用户表,与原表左连接,连接方式为用户id相等,登陆日期差一天。

1454. 活跃用户

题目详情

select distinct t.id,a.name
from 
(select id, date_sub(login_date,interval rn day ) as dif
from 
(select id,login_date, row_number()over(partition by id order by login_date) as rn
from 
(select id,login_date
from logins 
group by 1,2)t)t
group by 1,2
having count(*)>=5)t join accounts a 
on t.id = a.id
order by 1

解题思路:一共用了四个select。
1.第一个select目的是去重,解决用户一天登录多次的情况
2.第二第三个select是连续型问题的老操作了,利用窗口函数构造新列rn,转换为日期-rn问题
3.最后一个select是id去重,并与表accounts相连接得到name的信息

571. 给定数字的频率查询中位数

题目详情

select round(avg(number),1) as median
from 
(select number,
sum(frequency) over(order by number) as asc_number,
sum(frequency) over (order by number desc) as desc_number
from numbers)t1,
(select sum(frequency) as cnt from numbers) t2
where asc_number >= cnt/2 and desc_number >= cnt/2

技巧点:
1.利用正序累积值和倒序累积值计算中位数asc_number >= cnt/2 and desc_number >= cnt/2,很巧妙!

601. 体育馆的人流量

题目详情
与之前连续题不同的是,它要输出满足连续条件的所有记录。
解法1:

select id,visit_date,people
from 
(select id,visit_date,people,count(id) over(partition by diff) as cnt
from 
(select id,visit_date,people, cast(id as signed) - cast(rn as signed) as diff
from 
(select id,visit_date,people,row_number()over(order by id) as rn
from 
(select *
from stadium 
where people >= 100)t)t)t)t
where cnt >= 3
order by 1

解题思路:一共5个select。
1.最里面3个select是所有连续题一样的套路,计算id-rn
2.第4个select是利用窗口函数和count函数结合又生成了新的列,用以计算id-rn相同行的行数cnt
3.最后一个select是选择行数>=3的行

解法2:

select distinct t1.*
from stadium t1,stadium t2,stadium t3
where t1.people>=100 and t2.people>=100 and t3.people>=100
and ((t1.id-t2.id = 1 and t2.id-t3.id = 1) or (t2.id-t1.id = 1 and t3.id -t2.id = 1)
or ( t2.id -t1.id = 1 and t1.id -t3.id = 1))
order by t1.id 

解题思路:三表连接。由于最后选择t1的所有列,所以t1分别作连续三行的第一行,第二行,第三行。与解法1相比,解法2简洁不少。

1127. 用户购买平台

题目详情

select t1.spend_date,t1.platform,ifnull(total_amount,0) as total_amount,
ifnull(total_users,0) as total_users
from 
(select distinct spend_date,t.platform
from spending,
(select 'desktop' as platform union select 'mobile' as platform union
select 'both' as platform )t) t1 left join
(select spend_date, case when tag1 = 1 and tag2 = 0 then 'mobile'  when 
tag1 = 0 and tag2 = 1 then 'desktop' else 'both' end as platform,
sum(amount) as total_amount,count(user_id) as total_users
from 
(select spend_date,user_id,
sum(case when platform = 'mobile' then 1 else 0 end) as tag1,
sum(case when platform = 'desktop' then 1 else 0 end) as tag2,
sum(amount) as amount 
from spending 
group by 1,2)t
group by 1,2) t2
on t1.platform = t2.platform and t1.spend_date = t2.spend_date

解题思路:t2表格是主要部分。
1.通过sum函数和case when函数打了两个标签tag1,tag2,分别表示是否用mobile/desktop登陆过。之后根据标签构造t2表格
2.构造t1表格左连接t2表格,是保证没有数据的记录也有输出

1501. 可以放心投资的国家

题目详情

select c1.name as country 
from person p,country c1, calls c2
where left(phone_number,3) = 
country_code and (id = caller_id or id = callee_id)
group by 1
having avg(duration)>
(select avg(duration) from calls)

技巧点:
1.left函数截取部分字符串
2. id = caller_id or id = callee_id

1212. 查询球队积分

题目详情

select t.team_id,t.team_name,ifnull(sum(point),0) as num_points 
from 
(select host_team as team_id,case when 
host_goals = guest_goals then 1 when host_goals > guest_goals then 3 else 0 end as point
from matches
union all 
select guest_team as team_id,case when 
host_goals = guest_goals then 1 when host_goals > guest_goals then 0 else 3 end as point
from matches) k right join teams t 
on k.team_id = t.team_id 
group by 1
order by 3 desc, 1

技巧点:
1.将host和guest化列为行

常用函数

开窗函数

链接1

链接2

日期时间函数

链接1链接2

截取字符串函数

链接1链接2:lateral view explode

连接字符串函数

链接1

模糊查询和正则表达式

模糊查询正则表达式

数据类型转换函数

链接1

in和exist

链接1

with语句

链接1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值