本笔记为阿里云天池龙珠计划SQL训练营的学习内容,链接为:https://tianchi.aliyun.com/specials/promotion/aicampsql;
第一题:
SELECT X1.TICKER_SYMBOL, X1.END_DATE, X1.T_REVENUE, X1.T_COGS, X1.N_INCOME,
X2.END_DATE, X2.CLOSE_PRICE,
X3.INDIC_NAME_EN, X3.END_DATE,X3.VALUE
FROM (SELECT *
FROM `income statement`
WHERE TICKER_SYMBOL IN (600383, 600048)) AS X1
LEFT JOIN(SELECT TICKER_SYMBOL, END_DATE, CLOSE_PRICE
FROM `Market Data`
WHERE TICKER_SYMBOL IN (600383, 600048)) AS X2
ON X1.TICKER_SYMBOL = X2.TICKER_SYMBOL
LEFT JOIN(SELECT TICKER_SYMBOL, INDIC_NAME_EN, END_DATE, VALUE
FROM `Company Operating`
WHERE TICKER_SYMBOL IN (600383, 600048)) AS X3
ON X1.TICKER_SYMBOL = X3.TICKER_SYMBOL
第二题
SELECT * ,DENSE_RANK() OVER(order by `citric acid`) AS rank_
FROM task.`winequality-red`
WHERE PH = 3.03
第三题
SELECT Merchant_id,
COUNT(Merchant_id) AS quantity,
SUBSTRING_INDEX(Discount_rate, ':', -1) AS one_disc,
SUBSTRING_INDEX(Discount_rate, ':', -1) * COUNT(Merchant_id) AS all_mon
FROM (SELECT Merchant_id ,Discount_rate
FROM ccf_offline_stage1_test_revised
WHERE (Date_received BETWEEN '2016-7-1' AND '2016-7-31')
AND (Discount_rate NOT BETWEEN 0 AND 1)) AS X2
GROUP BY Merchant_id
ORDER BY all_mon DESC, quantity DESC
第四题
SELECT name_cn,
MAX(sum_data) AS max_value
FROM(SELECT name_cn,
PERIOD_DATE,
DATA_VALUE,
SUM(DATA_VALUE) OVER (PARTITION BY name_cn) AS sum_data
FROM task.`macro industry`
WHERE PERIOD_DATE BETWEEN '2015-1-1' AND '2015-1-31' ) AS X1
GROUP BY name_cn
ORDER BY max_value DESC
第四题答案:
SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522' -- 题干似乎没有要求
AND YEAR(PERIOD_DATE) = 2015
GROUP BY PERIOD_DATE
ORDER BY FianlValue DESC
LIMIT 1;
-- 去年同期就很容易了
第五题
SELECT A1.Merchant_id, used_dis/all_dis AS rate
FROM
(SELECT Merchant_id, COUNT(Date_received) OVER(PARTITION BY Merchant_id) AS used_dis
FROM task.ccf_online_stage1_train
WHERE Coupon_id IS NOT NULL
AND Date IS NULL AND Date_received BETWEEN '2016-06-01' AND '2016-06-30') AS A1
INNER JOIN
(SELECT Merchant_id, COUNT(Date_received) OVER(PARTITION BY Merchant_id) AS all_dis
FROM task.ccf_online_stage1_train
WHERE Coupon_id IS NOT NULL
AND Date_received BETWEEN '2016-06-01' AND '2016-06-30' ) AS A2
ON A1.Merchant_id = A2.Merchant_id
GROUP BY Merchant_id
第五题答案
SELECT Merchant_id,
SUM(CASE WHEN Date IS NULL AND Coupon_id IS NOT NULL THEN 1 ELSE 0 END) /
SUM(CASE WHEN Coupon_id IS NOT NULL THEN 1 ELSE 0 END) AS discard_rate
FROM ccf_online_stage1_train
WHERE Date_received BETWEEN '2016-06-01' AND '2016-06-30'
GROUP BY Merchant_id
ORDER BY discard_rate DESC
第五题答案写的思路很好,比我常规的子查询要更加高明,利用CASE 语句可以避免过多使用子查询,而且我的方法无法筛选出对于一个商家来说,所有用户都领取但没有使用的情况.
---- 暂时不知道如何解决
第六题
SELECT *, RANK() OVER(ORDER BY `residual sugar`) AS rank_
FROM task.`winequality-white`
WHERE pH = 3.63
第七题
SELECT TYPE_NAME_CN, SUM(MARKET_VALUE) AS sum_value
FROM task.`market data`
WHERE YEAR(END_DATE) = '2018-12-31'
-- 题干说截至2018年底,不知是否包括2017或以前的数据
GROUP BY TYPE_NAME_EN
ORDER BY sum_value DESC
LIMIT 3;
(2) 同一家公司的不同产品似乎不属于一家公司,排名未计算SUM之后的结果
思路:1.找到市值最大的三个行业
2.对三个行业中的公司市值排名 ROW_NUMBER
3.利用WHERE 取前三
SELECT BaseData.TYPE_NAME_CN,
BaseData.TICKER_SYMBOL
FROM (SELECT TYPE_NAME_CN, TICKER_SYMBOL, MARKET_VALUE,
ROW_NUMBER() OVER(PARTITION BY TYPE_NAME_CN ORDER BY MARKET_VALUE) AS CompanyRanking
FROM `market data` ) BaseData
LEFT JOIN
(SELECT TYPE_NAME_CN, SUM(MARKET_VALUE) AS sum_value
FROM task.`market data`
WHERE YEAR(END_DATE) = '2018-12-31'
GROUP BY TYPE_NAME_EN
ORDER BY sum_value DESC
LIMIT 3 ) top3Type
ON BaseData.TYPE_NAME_CN = top3Type.TYPE_NAME_CN
WHERE CompanyRanking <= 3
AND top3Type.TYPE_NAME_CN IS NOT NULL
第八题
SELECT User_id, SUM(A)
FROM
(SELECT User_id, COUNT(*) AS A
FROM ccf_offline_stage1_train
WHERE (Date IS NOT null AND Coupon_id IS NOT null) AND (LEFT(DATE,4)=2016 )
GROUP BY User_id
UNION ALL
SELECT User_id,COUNT(*) AS A
FROM ccf_online_stage1_train
WHERE (Date IS NOT null AND Coupon_id IS NOT null) AND (LEFT(DATE,4)=2016 )
GROUP BY User_id) AS X
GROUP BY User_id
ORDER BY SUM(A) DESC
对于大量数据来说,ORDER BY 似乎不可或缺,因为Mysql只会返回1000条记录,而在之后的操作也只对1000条中的数据操作,如取某一列最大值,只会取1000条数据中的最大值而不是表中的最大值
第九题
数据表中列太多,难以找到应操作的数据,查询语句本身不难,需要注意 GRUOP BY 和 ON 中多加元素时对结果的影响。
SELECT *
FROM (SELECT TICKER_SYMBOL,
YEAR(END_DATE) Year,
QUARTER(END_DATE) QUARTER,
SUM(VALUE) Amount
FROM `company operating`
WHERE INDIC_NAME_EN = 'Baiyun Airport:Passenger throughput'
GROUP BY TICKER_SYMBOL,YEAR(END_DATE),QUARTER(END_DATE)
ORDER BY SUM(VALUE) DESC
LIMIT 1 ) BaseData
LEFT JOIN -- income statement
(SELECT TICKER_SYMBOL,
YEAR(END_DATE) Year,
QUARTER(END_DATE) QUARTER,
SUM(N_INCOME) Amount
FROM `income statement`
GROUP BY TICKER_SYMBOL,YEAR(END_DATE),QUARTER(END_DATE) ) Income
ON BaseData.TICKER_SYMBOL = Income.TICKER_SYMBOL
AND BaseData.Year = Income.Year
AND BaseData.QUARTER = Income.QUARTER
第十题 (注意同一家店可能会有不同满减活动,不应使用SUBSTRING_INDEX() * COUNT
SELECT Merchant_id,
SUM(SUBSTRING_INDEX(Discount_rate, ':', -1))
OVER (PARTITION BY Merchant_id) AS sum_
FROM ccf_online_stage1_train
WHERE (Discount_rate NOT BETWEEN 0 AND 1)
AND (Date BETWEEN '2016-6-1' AND '2016-6-30')
AND Coupon_id IS NOT NULL
UNION
SELECT Merchant_id,
SUM(SUBSTRING_INDEX(Discount_rate, ':', -1))
OVER (PARTITION BY Merchant_id)
FROM ccf_offline_stage1_train
WHERE (Discount_rate NOT BETWEEN 0 AND 1)
AND (Date BETWEEN '2016-6-1' AND '2016-6-30')
AND Coupon_id IS NOT NULL
ORDER BY sum_ DESC
对于新版的Mysql,有些答案已经无法运行,十道题的题干有一些也表述不清,会对做题有一些影响,这十道题有难有易,质量很高,但是由于其中一个文件达到2GB,sublime 复制粘贴会直接不响应,所以只录入部分数据,应该不会对结果有影响