阿里云天池龙珠计划SQL练习答案

sql练习

本笔记为阿里云天池龙珠计划SQL训练营的学习内容,链接为:https://tianchi.aliyun.com/specials/promotion/aicampsql;

1.1 建表

create table Addressbook (
regist_no int not null,
name varchar(128) not null,
address varchar(256) not null,
tel_no char(10),
mail_address char(20),
primary key(regist_no)
);

1.2 添加一列

alter table Addressbook add column postal_code char(8) not null;

desc addressbook;

1.3 删除表

drop table addressbook;

1.4 再建表

create table Addressbook (
regist_no int not null,
name varchar(128) not null,
address varchar(256) not null,
tel_no char(10),
mail_address char(20),
primary key(regist_no)
);

– 2.1
select product_name
,regist_date
from product
where regist_date >‘2009-04-28’

– 2.2
/*
SELECT *
FROM product
WHERE purchase_price = NULL; 报错

SELECT *
FROM product
WHERE purchase_price <> NULL; 报错

SELECT *
FROM product
WHERE product_name > NULL; 报错
*/

– 2.3
select product_name, saleprice as sale_price, purchase price as purchase_price
from product
where saleprice - purchase price > 500;

– 2.4
select product_name, product_type, (saleprice * 0.9 - purchase_price )as profit
from product
where (saleprice * 0.9 - purchase_price ) > 100
and product_type in (‘办公用品’, ‘厨房用具’);

– 2.5
/*
SELECT product_id, SUM(product_name)
–本SELECT语句中存在错误。
FROM product
GROUP BY product_type
WHERE regist_date > ‘2009-09-01’;

字符型product_name 不能聚合
where条件要放在group by 之前
group by 后的字段product_type 和 select 后product_id 不同
*/

– 2.6
select product_type, sum(sale_price), sum(purchase_price)
from product
group by product_type
having sum(sale_price) > sum(purchase_price) * 1.5

–2.7
select product_id, product_name,product_type,sale_price,purchase_price,regist_date
from product
order by regist_date desc, sale_price;

3 多表视图

CREATE TABLE shop_product
(shop_id CHAR(4) NOT NULL,
shop_name VARCHAR(200) NOT NULL,
product_id CHAR(4) NOT NULL,
quantity INTEGER NOT NULL,
PRIMARY KEY (shop_id, product_id));

3.1

create view ViewPractices5_1 as
(
select product_name, sale_price,regist_date
from product
where sale_price >= 1000
and regist_date = ‘2009-09-20’
)

3.2

/*
INSERT INTO ViewPractice5_1 VALUES (’ 刀子 ', 300, ‘2009-11-02’);
视图中插入数据时,原表也会跟着插入数据,但是ViewPractice5_1 对应的原表有三个字段带有not null的约束字段
*/

3.3

select
product_id,
product_name,
product_type,
sale_price,
(select avg(sale_price) from product) as sale_price_all
from product;

3.4

create view AvgPriceByType as(
select
product_id,
product_name,
product_type,
sale_price,
(select avg(sale_price)
from product p2
where p1.product_type = p2.product_type
group by p1.product_type
) as avg_sale_price
from product p1);

3.5

运算或者函数中含有 NULL 时,结果全都会变为NULL ?(判断题)

3.6

/*
SELECT product_name, purchase_price
FROM product
WHERE purchase_price NOT IN (500, 2800, 5000);

执行结果只取出了purchase_price不是500, 2800, 5000的商品
不包含purchase_price为null的

SELECT product_name, purchase_price
FROM product
WHERE purchase_price NOT IN (500, 2800, 5000, NULL);

执行结果返回0条记录,因为not in 的参数中不能含有null,否则,查询结果是空
*/

3.7

select
sum(case when sale_price < 1000 and product_type in (‘T恤衫’, ‘办公用品’, ‘叉子’ , ‘擦菜板’, ‘圆珠笔’) then 1 end) as ‘low_price’,
sum(case WHEN SALE_price >=1000 and sale_price<3000 and product_type = ‘菜刀’ then 1 end) as ‘mid_price’,
sum(case when sale_price >=3100 and product_type in (‘运动T恤’, ‘高压锅’) then 1 end) as ‘high_price’
from product;

4.1 找出 product 和 product2 中售价高于 500 的商品的基本信息。

select * from product p1
where sale_price>500
union
select * from product p2
where sale_price>500;

4.2 借助对称差的实现方式, 求product和product2的交集。

/*
两个集合A,B的对称差是指那些仅属于A或仅属于B的元素构成的集合。

使用 not in 实现两个表的差集。
select * from product
where product_id not in (select product_id from product2)
union
select * from product2
where product_id not in (select product_id from product)

*/

并集-差集=交集

select * from
(
select * from product union select * from product2
) as u
where product_id not in
(
select * from product
where product_id not in (select product_id from product2)
union
select * from product2
where product_id not in (select product_id from product)
)

4.3 每类商品中售价最高的商品都在哪些商店有售 ?

– 每个类型中价格最高的
select product_type, max(sale_price)
from product2
group by product_type;

– 找出同类里的最大值商品
select product_id, product_type, product_name, sale_price
from product2 bb
where sale_price in(
select Max(sale_price)
from product2 as aa
where bb.product_type = aa.product_type
group by product_type
);

– 每类商品中售价最高的商品都在哪些商店有售
– 关联子查询替换窗口函数
select shop_id
from shop_product as sp
where sp.product_id in (
– 找出同类里的最大值商品
select product_id
from product2 bb
where sale_price in(
select Max(sale_price)
from product2 as aa
where bb.product_type = aa.product_type
group by product_type
)
);

4.4 分别使用内连结和关联子查询每一类商品中售价最高的商品

– 关联子查询 找出同类里的最大值商品
select product_id,product_type,product_name,sale_price
from product2 bb
where sale_price in(
select Max(sale_price)
from product2 as aa
where bb.product_type = aa.product_type
group by product_type
);

– 内连接 找出同类里的最大值商品
select a.product_id , a.product_type,a.product_name,a.sale_price
from product2 a
inner join
(
select product_type, max(sale_price) sale_price
from product2
group by product_type) b
on a.product_type = b.product_type and a.sale_price = b.sale_price;

4.5 用关联子查询实现:在product表中,取出 product_id, produc_name, sale_price, 同品类按照商品的售价从低到高进行排序、对所有商品售价进行累计求和。

– 不含子关联查询的情况
select product_id, product_type, product_name, sale_price
from product2 p1
ORDER BY sale_price;

– 带子关联查询, 但不处理 sale_price 相同的情况
select product_id, product_type, product_name, sale_price,
(select sum(sale_price)from product2 p2 where p1.sale_price >= p2.sale_price) 累计求和
from product2 p1
ORDER BY sale_price;

– 带子关联查询, 处理 sale_price 相同的情况
select product_id, product_type, product_name, sale_price,
(select sum(sale_price)from product2 p2
where p1.sale_price > p2.sale_price OR (p1.sale_price = p2.sale_price and p1.product_id >= p2.product_id)) 累计求和
from product2 p1
ORDER BY sale_price;

5.1

/*
<窗口函数> over ([partition by <列名>] order by <排序用列名>)
[]中的内容可以省略不写
partition by 用来分组,即选择要看哪个窗口,类似group by,但不具备group by的汇总功能,不会改变原始表中记录行数。
order by 用来排序,即决定窗口内按哪种规则(字段)来排序

*/
select product_name
,product_type
,sale_price
,rank() over (partition by product_type order by sale_price) as ranking
from product2;

/*
5.2
窗口函数种类大致分两种:
一是 sum max min等聚合函数在窗口函数中使用
二是 rank dense_rank等排序用的专用函数

rank函数,如果存在相同位次的记录,则会跳过之后的位次。 例如三个排名第一的:1,1,1,4
dense_rank函数,即使存在相同的位次,也不会跳过之后的位次。例如三个排名第一的:1,1,1,2
row_number函数,有
*/

5.2.1.专用窗口函数

select product_name, product_type, sale_price,
rank() over(order by sale_price) as ranking,
dense_rank() over (order by sale_price) as dense_ranking,
row_number() over (order by sale_price) as row_num
from product2;

5.2.2.聚合函数在窗口函数上使用

select product_id, product_name, sale_price,
sum(sale_price) over (order by product_id) as current_sum – 当前所在行的累计求和
from product2;

select product_id, product_name, sale_price,
avg(sale_price) over (order by product_id) as current_avg – 截止当前所在行的平均
from product2;

5.3 窗口函数的应用-计算移动平均

/*
上面提到聚合函数在窗口函数的使用,计算的是累计到当前行的所有的数据的聚合。
实际上还可以指定更加详细的汇总范围。该范围称为框架。

语法:
<窗口函数> over (order by <排序用列名> Rows n Preceding)
<窗口函数> over (order by <排序用列名> Rows between n Preceding and n Following)

PRECEDING(“之前”), 将框架指定为 “截止到之前 n 行”,加上自身行
FOLLOWING(“之后”), 将框架指定为 “截止到之后 n 行”,加上自身行
BETWEEN 1 PRECEDING AND 1 FOLLOWING,将框架指定为 “之前1行” + “之后1行” + “自身”
*/
select product_id, product_name,sale_price,
sum(sale_price) over(order by product_id rows 2 preceding) as moving_avg, #截止到之前2行加上自身行
sum(sale_price) over(order by product_id rows between 1 preceding and 1 following) as moving_avg2 #截止上一行、下一行、自身行
from product2;

窗口函数使用范围:1.窗口函数只能在select子句中使用;2.窗口函数over中的order by 子句并不影响最终结果的排序,只是决定窗口函数按何种顺序计算。

5.4 grouping运算符

/*
常规的group by 只能得到每个分类的小计,有时候还需要计算分类的合计,可以用rollup
/
select product_type, regist_date,
sum(sale_price) as sum_price
from product2
group by product_type, regist_date with rollup;
/

这里ROLLUP 对product_type, regist_date两列进行合计汇总。
结果实际上有三层聚合,如下图 模块3是常规的 GROUP BY 的结果,需要注意的是衣服 有个注册日期为空的,
这是本来数据就存在日期为空的,不是对衣服类别的合计;
模块2和1是 ROLLUP 带来的合计,模块2是对产品种类的合计,模块1是对全部数据的总计。
ROLLUP 可以对多列进行汇总求小计和合计。
*/

5.1请说出针对本章中使用的product(商品)表执行如下 SELECT 语句所能得到的结果。

SELECT product_id
,product_name
,sale_price
,MAX(sale_price) OVER (ORDER BY product_id) AS Current_max_price # 截至目前行的最大值
FROM product2;

5.2

/*
继续使用product表,计算出按照登记日期(regist_date)升序进行排列的各日期的销售单价(sale_price)的总额。
排序是需要将登记日期为NULL 的“运动 T 恤”记录排在第 1 位(也就是将其看作比其他日期都早
*/
SELECT product_id
,product_name
,regist_date
,sale_price
,sum(sale_price) OVER (partition by regist_date ORDER BY regist_date) AS sum_price #
FROM product2;

5.3

/*
① 窗口函数不指定PARTITION BY的效果是什么?
窗口函数不指定partition by的时候,是针对排序列进行全局排序

② 为什么说窗口函数只能在SELECT子句中使用?实际上,在ORDER BY 子句使用系统并不会报错。
主要是因为sql执行的顺序:from–where–group by–having–select–order by
如果在 where、group by、having使用窗口函数的话,就是说提前一次执行了排序,
排序之后再去除记录、汇总、汇总过滤,第一次排序的结果就是错误的,没有意义了。
而order by用在select不会出现这种问题。
*/

6.10道经典 ======================

select * from income statement limit 100;
select * from Company Operating limit 100;
select * from Market Data limit 100;

1.

select
Income Statement.TICKER_SYMBOL,
Income Statement.END_DATE,
Income Statement.T_REVENUE,
Income Statement.T_COGS,
Income Statement.N_INCOME,
Market Data.TICKER_SYMBOL,
Market Data.END_DATE,
Market Data.CLOSE_PRICE,
Company Operating.TICKER_SYMBOL,
Company Operating.INDIC_NAME_EN,Company Operating.END_DATE,Company Operating.VALUE
from Market Data
left join Income Statement
on Income Statement.TICKER_SYMBOL = Market Data.TICKER_SYMBOL
left join Company Operating
on Company Operating.TICKER_SYMBOL = Market Data.TICKER_SYMBOL
where Market Data.TICKER_SYMBOL in (‘600383’, ‘600048’);

2.

select * from winequality-red limit 100;

select *,
dense_rank() over (order by citric acid) as dense_ranking
from winequality-red
where pH=3.03;

3. 找出在2016年7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。

select * from ccf_offline_stage1_test_revised limit 500;

发放优惠券张数最多的商家

select Merchant_id 商家id,
count(Coupon_id) 发放优惠券数量
from ccf_offline_stage1_test_revised
where Date_received >=‘2016-07-01’ and Date_received <=‘2016-07-31’
group by Merchant_id
order by 发放优惠券数量 desc

发放优惠券总金额最多

select Merchant_id 商家id,
sum(substring_index(Discount_rate,’:’, -1) )as 总优惠券金额
from ccf_offline_stage1_test_revised
where Date_received >=‘2016-07-01’ and Date_received <=‘2016-07-31’
group by Merchant_id
order by 总优惠券金额 desc

4. 请计算全社会用电量:第一产业:当月值在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?

select * from macro industry limit 100;

当月值在2015年用电最高峰是发生在哪月?

select
substring_index(substring_index(period_date,’-’, 2),’-’,-1) as month,
sum(DATA_VALUE)
from macro industry
where PERIOD_DATE >=‘2015-01-01’ and PERIOD_DATE <=‘2015-12-31’
group by month order by sum(DATA_VALUE) desc;

相比去年同期增长/减少了多少个百分比?

select a.month,a.电量,b.电量, (a.电量-b.电量)/b.电量 as 同比
from
(select
substring_index(substring_index(period_date,’-’, 2),’-’,-1) as month,
sum(DATA_VALUE) as 电量
from macro industry
where PERIOD_DATE >=‘2015-12-01’ and PERIOD_DATE <=‘2015-12-31’
group by month)a
left join
(
select
substring_index(substring_index(period_date,’-’, 2),’-’,-1) as month,
sum(DATA_VALUE) as 电量
from macro industry
where PERIOD_DATE >=‘2014-12-01’ and PERIOD_DATE <=‘2014-12-31’
group by month)b
on a.month = b.month

6. 找出 pH=3.63的所有白葡萄酒,然后,对其 residual sugar 量进行英式排名(非连续的排名)

select * from winequality-white limit 100;

select *,
rank() over (order by residual sugar) as ranking
from winequality-white
where pH=3.63

7.计算截止到2018年底,市值最大的三个行业是哪些?以及这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)

select * from Market Data limit 100;

截止到2018年底,市值最大的三个行业是哪些?

select TYPE_NAME_CN,
sum(MARKET_VALUE) 市值
from Market Data
where substring_index(END_DATE,’-’,1) <= ‘2018’
group by TYPE_NAME_CN
order by 市值 desc
limit 3;

– 行业 公司 市值
select b.TYPE_NAME_CN 行业, b.ticker_symbol 公司, sum(b.MARKET_VALUE) 市值
from
(
select TYPE_NAME_CN, sum(MARKET_VALUE) 市值
from Market Data
where substring_index(END_DATE,’-’,1) <= ‘2018’
group by TYPE_NAME_CN
order by 市值 desc
limit 3
) a
inner JOIN market data b
ON a.TYPE_NAME_CN = b.TYPE_NAME_CN
group by b.ticker_symbol, b.TYPE_NAME_CN;

– 变量,行业里的前三
select r.* from
(
select zz.行业, zz.公司, zz.市值/10000,
@rank:=if(@tmp=zz.行业, @rank+1, 1) as new_rank,
@tmp:=zz.行业 as tmp
from
(
select b.TYPE_NAME_CN 行业, b.ticker_symbol 公司, sum(b.MARKET_VALUE) 市值
from
(
select TYPE_NAME_CN, sum(MARKET_VALUE) 市值
from Market Data
where substring_index(END_DATE,’-’,1) <= ‘2018’
group by TYPE_NAME_CN
order by 市值 desc
limit 3
) a
inner JOIN market data b
ON a.TYPE_NAME_CN = b.TYPE_NAME_CN
group by b.ticker_symbol, b.TYPE_NAME_CN
)zz
order by zz.行业, zz.市值 desc
)r
where r.new_rank <=3;

这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)

SELECT TYPE_NAME_CN,TICKER_SYMBOL,MARKET_VALUE,
ROW_NUMBER() OVER(PARTITION BY TYPE_NAME_CN ORDER BY MARKET_VALUE desc ) CompanyRanking
FROM market data;

– 窗口函数分组后找前三 窗口函数的执行顺序是在having之后
select * from (select *,
row_number() over (partition by r.行业 order by r.市值 desc) as new_rank
from (
select b.TYPE_NAME_CN 行业, b.ticker_symbol 公司, sum(b.MARKET_VALUE) 市值
from (
select TYPE_NAME_CN, sum(MARKET_VALUE) 市值
from Market Data
where substring_index(END_DATE, ‘-’, 1) <= ‘2018’
group by TYPE_NAME_CN
order by 市值 desc
limit 3
) a
inner JOIN market data b
ON a.TYPE_NAME_CN = b.TYPE_NAME_CN
group by b.ticker_symbol, b.TYPE_NAME_CN
) r
)ret
where new_rank <=3;

– 行业内查公司市值
select TYPE_NAME_CN, ticker_symbol, sum(MARKET_VALUE) from Market Data
where TYPE_NAME_CN = ‘采掘’
group by TYPE_NAME_CN, ticker_symbol
order by sum(MARKET_VALUE) desc
limit 3;

– 9. 按季度统计,白云机场旅客吞吐量最高的那一季度对应的净利润是多少?(注意,是单季度对应的净利润,非累计净利润。)
select distinct INDIC_NAME_EN from company operating ;
select * from income statement limit 100;

– 白云机场旅客吞吐量最高的那一季度 2018第一季度
select b.* from
( – 2018第一季度
select TICKER_SYMBOL, year(END_DATE) 年, quarter(END_DATE) 季度, sum(value) 总量
from company operating
where INDIC_NAME_EN = ‘Baiyun Airport:Passenger throughput’
group by TICKER_SYMBOL, 年, 季度
order by 总量 desc
limit 1
)a
left join
(
select TICKER_SYMBOL, year(END_DATE) 年, quarter(END_DATE) 季度, SUM(N_INCOME) 税后收入
from income statement
group by TICKER_SYMBOL, 年, 季度
)b
on a.TICKER_SYMBOL = b.TICKER_SYMBOL
and a.年 = b.年
and a.季度 = b.季度;

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值