面试遇到的sql笔试题目,共享给大家,如果是数据分析或者数据开发岗位,对sql一般都需要掌握,以下问题,最好能再hive,mysql或者oracle场景下都 实现下,熟悉巩固各个数据库的语法。
1. A 和B 两张表,A中有B表的外键,B表中有A表中的外键,求A表中,B表中没有的数据行,B表中,A表没有的数据行。
select * from a where not exists (select 1 from b where a.主键=b.外键)
select * from b where not exists (select 1 from a where b.主键=a.外键)
当然,还可以用left outer join 来实现。如果是hive的话,没有not exist语法,可以考虑这种方式。
2. 实现前n的逻辑
员工表Employees,里面有字段 EmployeeID,EmployeeName,
销售表Sales 里面有字段EmployeeID,productID,salesamount。
例如:EmployeeID,productID,salesamount
1 1 100
1 2 80
2 1 39
2 2 50
EmployeeID,EmployeeName
1 Jack
2 Mary
问:查出把所有负责的产品销售50以上的销售员工姓名。
select a.EmployeeName from Employees as a, Sales as b
where a.EmployeeID=b.EmployeeID and not exixts
(select 1 from b where EmployeeID=b.EmployeeID and salesmount<50
3. 求同比,环比,累计
首先,理解三者概念:
- 环比就是今年第n月与第n-1月或第n+1月比;
- 同比就是今年第n月与去年第n月比。
- 累计就是累计同累加,指前几个与后几个的结合。
然后,创建测试数据
CREATE TABLE salaryByMonth
(
employeeNo varchar2(20),
yearMonth varchar2(6),
salary number
) ;
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200805', 500);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200802', 150);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200803', 200);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200804', 300);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200708', 100);
commit;
求同比,环比等
SELECT EMPLOYEENO
,YEARMONTH
,SALARY
,MIN(SALARY) KEEP(DENSE_RANK FIRST ORDER BY YEARMONTH) OVER(PARTITION BY EMPLOYEENO) FIRST_SALARY -- 基比分析 salary/first_salary
,LAG(SALARY, 1, 0) OVER(PARTITION BY EMPLOYEENO ORDER BY YEARMONTH) AS PREV_SAL -- 环比分析,与上个月份进行比较
,LAG(SALARY, 12, 0) OVER(PARTITION BY EMPLOYEENO ORDER BY YEARMONTH) AS PREV_12_SAL -- 同比分析,与上个年度相同月份进行比较
,SUM(SALARY) OVER(PARTITION BY EMPLOYEENO, SUBSTR(YEARMONTH, 1, 4) ORDER BY YEARMONTH RANGE UNBOUNDED PRECEDING) LJ --累计值
FROM SALARYBYMONTH
ORDER BY EMPLOYEENO
,YEARMONTH
在网上找到的一个案例,一个不错的实现方式,主要是用到数字虚拟辅助表的方式,
建测试表,假设是个销售数量表:
create table t (dt date,cnt number);
随便插入些测试数据,最后表的内容如下:
DT CNT
2012-02-09 15
2012-02-21 2
2012-03-23 1
2012-03-23 5
2012-04-23 2
2012-05-12 20
2012-07-01 20
2012-07-21 20
显示环比,如果某个月没有销售,就有点儿麻烦。
先想个办法,显示本年度的1至12月,有个很有技巧的技术来实现:
SQL> set pagesize 20
SQL> with m as (select '2012-'||lpad(rownum,2,'0') v from dual connect by level<=12) select * from m;
得到如下结果:
---------
2012-01
2012-02
2012-03
2012-04
2012-05
2012-06
2012-07
2012-08
2012-09
2012-10
2012-11
2012-12
接下来就容易了,两个表外连,使用lag统计函数就可以了。
下面展示一些 内联代码片
。
with m as (
select '2012-'||lpad(rownum,2,'0') v
from dual
connect by level<=12
)
select m.v "月份"
,nvl(sum(t.cnt),0) "销售数量"
,lag(nvl(sum(t.cnt),0),1) over (order by m.v) "上月销量数量"
,nvl(sum(t.cnt),0)-lag(nvl(sum(t.cnt),0),1) over (order by m.v) "环比增加量"
,round((case when nvl(sum(t.cnt),0)=0 then null else (nvl(sum(t.cnt),0)-lag(nvl(sum(t.cnt),0),1) over (order by m.v))/nvl(sum(t.cnt),0) end)*100,1) "环比增加比例(%)"
from m
left outer join t
on m.v=to_char(t.dt,'yyyy-mm')
group by m.v
order by 1
月份 销售数量 上月销量数量 环比增加量 环比增加比例(%)
2012-01 0
2012-02 17 0 17 100
2012-03 6 17 -11 -183.3
2012-04 2 6 -4 -200
2012-05 20 2 18 90
2012-06 0 20 -20
2012-07 40 0 40 100
2012-08 0 40 -40
2012-09 0 0 0
2012-10 0 0 0
2012-11 0 0 0
2012-12 0 0 0
4. 统计连续日期的库存量create view test_v as
一样的,创建测试数据
select date '2015-07-29' as 日期,'AAA' as 物料,'S' as 仓库,100 as 库存金额,50 as 库存数量 from dual union all
select date '2015-07-31' as 日期,'BBB' as 物料,'B' as 仓库,70 as 库存金额,20 as 库存数量 from dual union all
select date '2015-08-01' as 日期,'BBB' as 物料,'B' as 仓库,120 as 库存金额,60 as 库存数量 from dual union all
select date '2015-08-02' as 日期,'BBB' as 物料,'B' as 仓库,90 as 库存金额,40 as 库存数量 from dual union all
select date '2015-08-06' as 日期,'AAA' as 物料,'S' as 仓库,40 as 库存金额,10 as 库存数量 from dual union all
select date '2015-08-07' as 日期,'CCC' as 物料,'W' as 仓库,70 as 库存金额,20 as 库存数量 from dual union all
select date '2015-08-10' as 日期,'BBB' as 物料,'S' as 仓库,88 as 库存金额,22 as 库存数量 from dual
;
WITH x0 AS (
SELECT DATE '2015-07-29' AS d FROM dual),
x1 AS (
SELECT d+LEVEL-1 AS d1 FROM x0 CONNECT BY LEVEL<31
)
SELECT x.d1,
CASE WHEN v.物料 IS NULL THEN LAG(v.物料 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.物料 END AS 物料,
case WHEN v.仓库 IS NULL THEN LAG(v.仓库 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.仓库 END AS 仓库,
case WHEN v.库存金额 IS NULL THEN LAG(v.库存金额 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.库存金额 END AS 库存金额,
CASE WHEN v.库存数量 IS NULL THEN LAG(v.库存数量 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.库存数量 END AS 库存数量
FROM x1 x,test_v v
where x.d1 = v.日期(+)
ORDER BY x.d1 ASC;
drop view test_v;
select sysdate from dual;
select trunc(sysdate)+rownum-1 from dual connect by rownum <=to_date('2018-12-31','yyyy-mm-dd')-trunc(sysdate);
5. 统计一个cookie,pv数最多的前1/3的天
SELECT
cookieid,
createtime,
pv,
NTILE(3) OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn
FROM lxw1234;
--rn = 1 的记录,就是我们想要的结果
取分组内排序后最后一个值
SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1,
FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime DESC) AS last2
FROM lxw1234
ORDER BY cookieid,createtime;
结果显示:
cookieid createtime url rn last1 last2
-------------------------------------------------------------
cookie1 2015-04-10 10:00:00 url1 1 url1 url7
cookie1 2015-04-10 10:00:02 url2 2 url2 url7
cookie1 2015-04-10 10:03:04 1url3 3 1url3 url7
cookie1 2015-04-10 10:10:00 url4 4 url4 url7
cookie1 2015-04-10 10:50:01 url5 5 url5 url7
cookie1 2015-04-10 10:50:05 url6 6 url6 url7
cookie1 2015-04-10 11:00:00 url7 7 url7 url7
cookie2 2015-04-10 10:00:00 url11 1 url11 url77
cookie2 2015-04-10 10:00:02 url22 2 url22 url77
cookie2 2015-04-10 10:03:04 1url33 3 1url33 url77
cookie2 2015-04-10 10:10:00 url44 4 url44 url77
cookie2 2015-04-10 10:50:01 url55 5 url55 url77
cookie2 2015-04-10 10:50:05 url66 6 url66 url77
cookie2 2015-04-10 11:00:00 url77 7 url77 url77
select nvl(t.first_name,'total'),sum(t.salary)
from t_test1 t
group by rollup(t.first_name)
毕业4年,从应届生到BI数据分析师老油条,不定期将过去自己求职积累经验和数据分析学习相关的一些笔记分享给大家,对互联网数据分析、机器学习有兴趣的朋友也可以关注我的工重号:python数据分析和机器学习,专注BI、数据分析和机器学习的学习和实践 .