本笔记为阿里云天池龙珠计划SQL训练营的学习内容,链接为:https://tianchi.aliyun.com/specials/promotion/aicampsql
一、学习知识点概要
第一题答案
SELECT MarketData.*,
OperatingData.INDIC_NAME_EN,
OperatingData.VALUE,
IncomeStatement.T_REVENUE,
IncomeStatement.T_COGS,
IncomeStatement.N_INCOME
FROM(SELECT TICKER_SYMBOL,END_DATE,
CLOSE_PRICE
FROM `market data`
WHERE TICKER_SYMBOL IN ('600383','600048')) MarketData
LEFT JOIN
(SELECT TICKER_SYMBOL,
INDIC_NAME_EN,
END_DATE,
VALUE
FROM `company operating`
WHERE TICKER_SYMBOL in ('600383','600048')) OperatingData
ON MarketData.TICKER_SYMBOL=OperatingData.TICKER_SYMBOL
AND MarketData.END_DATE=OperatingData.END_DATE
LEFT JOIN
(SELECT DISTINCT TICKER_SYMBOL,
END_DATE,
T_REVENUE,
T_COGS,
N_INCOME
FROM `income statement`
WHERE TICKER_SYMBOL IN ('600383' AND '600048')) IncomeStatement
ON MarketData.TICKER_SYMBOL=IncomeStatement.TICKER_SYMBOL
AND MarketData.END_DATE=IncomeStatement.END_DATE
ORDER BY MarketData.TICKER_SYMBOL,MarketData.END_DATE
①LEFT JOIN 后面不用加FROM,加了会出问题
② SQL查询的执行顺序:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
③DISTINCT 关键词用于返回唯一不同的值。
SQL SELECT DISTINCT 语法
SELECT DISTINCT column_name,column_name
FROM table_name;
第二题答案
SELECT pH,
`citric_acid`
FROM `winequality-red`
DENSE_RANK() OVER (ORDER BY `citric_acid`) AS rankn
WHERE pH =3.03;
二、学习内容
第三题答案
-- 发放优惠券总额最多的商家
SELECT Merchant_id,
-- SUM(SUBSTRING_INDEX(`Discount_rate`,':', 1)) AS sale_amount,
SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount
FROM ccf_offline_stage1_test_revised
WHERE Date_received BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY Merchant_id
ORDER BY discount_amount DESC
LIMIT 1;
-- 发放优惠券张数最多的商家
SELECT Merchant_id,COUNT(1) AS cnt
FROM ccf_offline_stage1_test_revised
WHERE Date_received BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY Merchant_id
ORDER BY cnt DESC
LIMIT 1;
①日期查询限定格式:
WHERE column(日期) BETWEEN 'xxxx-xx-xx' AND 'yyyy-yy-yy'
②
substring_index()函数(笔记在task03SUBSTRING_INDEX – 字符串按索引截取)
substring_index()函数用来截取字符串
substring_index(str,delim,count)
str:要处理的字符串
delim:分隔符
count:计数
该函数用来获取原始字符串按照分隔符分割后,第 n 个分隔符之前(或之后)的子字符串,支持正向和反向索引,索引起始值分别为 1 和 -1。
个人见解:
例如这个句子:
SELECT SUBSTRING_INDEX('www.mysql.com.qq', '.', -3)
可以理解为分隔符“.”从右往左数第三个“.”的右边所有部分:
也就是mysql.com.qq
SELECT SUBSTRING_INDEX('www.mysql.com.qq', '.', 3)
这个句子可以理解为分隔符“.”从左往右数第三个“.”的左边所有部分
也就是www.mysql.com
③LIMIT 1的作用:
在根据非索引字段查询时,若查询结果只有一条记录,可以在sql语句中使用 " limit 1",来有效缩短查询时间。
缩短查询时间的原因:加入Limit后,查询结果能够被一级缓存保存!从底层分析的话,加入Limit 1后,匹配到一条数据后,就不会再往下查询了,所以,性能的提升和数据量的大小有很大关系。
④Substring_INDEX在Substring后面记得加“_”。
第四题
--2015用电高峰月(本人看答案后更改)
SELECT PERIOD_DATE,MAX(DATA_VALUE) FinalValue
FROM `macro industry`
WHERE INDIC_ID='2020101522'
AND YEAR(PERIOD_DATE)=2015
GROUP BY PERIOD_DATE
ORDER BY FinalValue
LIMIT 1;
--相比去年同期增长了多少个百分比
--本人做答(不会)
/*SELECT PERIOD_DATE,MAX(DATA_VALUE) FinalValue
FROM `macro industry`
WHERE INDIC_ID='2020101522'
AND YEAR(PERIOD_DATE)=2015
LEFT JOIN(
SELECT PERIOD_DATA,MAX(DATA_VALUE) FinalValue2
FROM `macro industry` AS P2
AND YEAR(PERIOD_DATE)=2014)
LEFT JOIN
(SELECT MAX(DATA_VALUE) FinalValue,
MAX(DATA_VALUE) FinalValue2,
((MAX(DATA_VALUE) FinalValue-MAX(DATA_VALUE) FinalValue2)/
MAX(DATA_VALUE) FinalValue2) AS Rate
GROUP BY PERIOD_DATE
ORDER BY FinalValue
LIMIT 1; */
--本人看答案后修改版本
SELECT Basedata.*,
(BaseData.Finalvalue-YOY.Finalvalue2)/YOY.Finalvalue2 AS YOY
FROM (SELECT PERIOD_DATE,
MAX(DATA_VALUE) AS Finalvalue
FROM `macro industry`
WHERE INDIC_ID='2020101522'
AND YEAR(PERIOD_DATE)=2015
GROUP BY PERIOD_DATE
ORDER BY Finalvalue DESC
LIMIT 1) Basedata
LEFT JOIN
(SELECT PERIOD_DATE,MAX(DATA_VALUE) AS Finalvalue2
FROM `macro industry`
WHERE INDIC_ID='2020101522'
AND YEAR(PERIOD_DATE)=2014
GROUP BY PERIOD_DATE) AS YOY
ON YEAR(Basedata.PERIOD_DATE)=YEAR(YOY.PERIOD_DATE)+1
AND MONTH(Basedata.PERIOD_DATE)=MONTH(YOY.PERIOD_DATE);
①日期函数的年份单独查询方法:
YEAR(时间列名)=年份,或具体年月日:
YEAR(时间列名)=‘xxxx-xx-xx'
②使用ORDER BY后默认的排序是升序,升序可能会将题目要求的数据颠倒,要注意这个细节,加上DESC
③AS是可以省略的
第六题
/*请使用 Wine Quality Data 数据集《winequality-white.csv》,找出 pH=3.63的
所有白葡萄酒,然后,对其 residual sugar 量进行英式排名(非连续的排名)*/
--本人做答
SELECT white.*
FROM `winequality-white` AS white
WHERE pH='3.63'
RANK() OVER (ORDER BY `residual sugar`) AS ranking
--标准答案
SELECT pH,
`residual sugar`,
RANK() OVER (ORDER BY `residual sugar`) AS rankn
FROM `winequality-white`
WHERE pH= 3.63;
①RANK() OVER (ORDER BY column名 ) AS rankn 为英式排名
第七题
/*请使用A股上市公司季度营收预测中的数据集《Market Data.xlsx》中的sheet-DATA,
计算截止到2018年底,市值最大的三个行业是哪些?以及这三个行业里市值
最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)*/
--本人作答(看答案后修改的版本)
-- 计算截止到2018年底,市值最高的三个行业业是哪些?
SELECT TYPE_NAME_CN,
SUM(MARKET_VALUE) AS Sum_value
FROM `market data`
WHERE YEAR(END_DATE) = '2018-12-31'
GROUP BY TYPE_NAME_CN
ORDER BY Sum_value DESC
LIMIT 3;
--标准答案
-- 计算截止到2018年底,市值最高的三个行业业是哪些?
SELECT TYPE_NAME_CN,
SUM(MARKET_VALUE)
FROM `market data`
WHERE YEAR(END_DATE) = '2018-12-31'
GROUP BY TYPE_NAME_CN
ORDER BY SUM(MARKET_VALUE) DESC
LIMIT 3
--这三个行业里市值最大的三个公司是哪些?
--本人作答(不会做)
SELECT TICKER_SYMBOL,TYPE_NAME_CN,
SUM(MARKET_VALUE) AS Maxium_in_3_type
FROM(
SELECT TYPE_NAME_CN,
SUM(MARKET_VALUE) AS Sum_value
FROM `market data`
WHERE YEAR(END_DATE) = '2018-12-31'
GROUP BY TYPE_NAME_CN
ORDER BY Sum_value DESC
LIMIT 3)
GROUP BY TICKER_SYMBOL
ORDER BY Maxium_in_3_type DESC
--标准答案
--这三个行业里市值最大的三个公司是哪些?
SELECT BaseData.TYPE_NAME_CN,
BaseData.TICKER_SYMBOL
FROM (SELECT TYPE_NAME_CN,
TICKER_SYMBOL,
MARKET_VALUE,
ROW_NUMBER() OVER(PARTITION BY TYPE_NAME_CN ORDER BY MARKET_VALUE) AS
CompanyRanking --运用窗口函数将每种行业中的市值进行排序
FROM `market data` ) BaseData
LEFT JOIN --左连结
( SELECT TYPE_NAME_CN,
SUM(MARKET_VALUE)
FROM `market data`
WHERE YEAR(END_DATE) = '2018-12-31'
GROUP BY TYPE_NAME_CN
ORDER BY SUM(MARKET_VALUE) DESC
LIMIT 3 ) top3Type --第一问中的答案,也就是三大市值总值最高的行业
ON BaseData.TYPE_NAME_CN = top3Type.TYPE_NAME_CN
WHERE CompanyRanking <= 3 --限定每个行业中的不同市值的公司个数
AND top3Type.TYPE_NAME_CN IS NOT NULL --行业名不能为空
①PARTITION BY的用法让同一商品种类中的物品能够优先比较某些方面的参数(ORDER BY),然后按照升序排列!
PARTITION BY 能够设定窗口对象范围
②LIMIT x的作用:
在根据非索引字段查询时,若查询结果只有x条记录,可以在sql语句中使用 " limit x",来有效缩短查询时间。
缩短查询时间的原因:加入Limit后,查询结果能够被一级缓存保存!从底层分析的话,加入Limit 1后,匹配到一条数据后,就不会再往下查询了,所以,性能的提升和数据量的大小有很大关系。
③日期函数的年份单独查询方法:
YEAR(时间列名)=年份,或具体年月日:
YEAR(时间列名)=‘xxxx-xx-xx'
三、学习问题与解答
第八题
--标准答案
SELECT User_id,
SUM(couponCount) couponCount
FROM (SELECT User_id,
count(*) AS couponCount
FROM `ccf_online_stage1_train`
WHERE (Date != 'null' AND Coupon_id != 'null')
AND (LEFT(DATE,4)=2016 )
GROUP BY User_id
UNION ALL
SELECT User_id,
COUNT(*) AS couponCount
FROM `ccf_offline_stage1_train`
WHERE (Date != 'null' AND Coupon_id != 'null')
AND (LEFT(DATE,4)=2016 )
GROUP BY User_id ) AS BaseData
GROUP BY User_id
ORDER BY SUM(couponCount) DESC
LIMIT 1
第九题】
不会做
本人根据答案作答:
SELECT *
FROM(
SELECT TICKER_SYMBOL,
SUM(VALUE) AS Amount,
YEAR(END_DATE) AS YEAR,
QUARTER(END_DATE) AS QUARTER
FROM `company operating`
WHERE INDIC_NAME_EN ='Baiyun Airport:Passenger throughput'
GROUP BY TICKER_SYMBOL,
YEAR,
QUARTER
ORDER BY Amount DESC) BASEDATA
LEFT JOIN
(SELECT TICKER_SYMBOL,
SUM(N_INCOME) AS Amount2,
YEAR(END_DATE) AS YEAR,
QUARTER(END_DATE) AS QUARTER
FROM `income statement`
GROUP BY TICKER_SYMBOL,
YEAR,
QUARTER) AS INCOME
ON BASEDATA.TICKER_SYMBOL=INCOME.TICKER_SYMBOL
标准答案:
SELECT *
FROM (SELECT TICKER_SYMBOL,
YEAR(END_DATE) AS Year,
QUARTER(END_DATE) AS QUARTER,
SUM(VALUE) AS Amount
FROM `company operating`
WHERE INDIC_NAME_EN = 'Baiyun Airport:Passenger throughput'
GROUP BY TICKER_SYMBOL,
YEAR(END_DATE),
QUARTER(END_DATE)
ORDER BY SUM(VALUE) DESC
LIMIT 1 ) AS BaseData
LEFT JOIN -- income statement
(SELECT TICKER_SYMBOL,
YEAR(END_DATE) AS Year,
QUARTER(END_DATE) AS QUARTER,
SUM(N_INCOME) Amount
FROM `income statement`
GROUP BY TICKER_SYMBOL,
YEAR(END_DATE),
QUARTER(END_DATE) ) AS Income
ON BaseData.TICKER_SYMBOL = Income.TICKER_SYMBOL
第十题
--标准答案
SELECT Merchant_id,
SUM(discount_amount) discount_amount
FROM (SELECT Merchant_id,
SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount
FROM `ccf_online_stage1_train`
WHERE (Date != 'null' AND Coupon_id != 'null')
AND (LEFT(DATE,4)=2016 )
AND MID(DATE,5,2) = '06'
GROUP BY Merchant_id
UNION ALL
SELECT Merchant_id,
SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount
FROM `ccf_offline_stage1_train`
WHERE (Date != 'null' AND Coupon_id != 'null')
AND (LEFT(DATE,4)=2016 )
AND MID(DATE,5,2) = '06'
GROUP BY Merchant_id ) BaseData
GROUP BY Merchant_id
ORDER BY SUM(discount_amount) DESC
LIMIT 1
--本人看答案后修正版本
SELECT Merchant_id,SUM(discount_amount) AS discount_amount
FROM(
SELECT Merchant_id,
SUM(SUBSTRING_INDEX(Discount_rate,':',-1)) AS discount_amount
FROM ccf_offline_stage1_train
WHERE (Date !='null' AND Coupon_id !='null')
AND(LEFT(Date,4)=2016 )
AND MID(Date,5,2)=06
GROUP BY Merchant_id
UNION ALL
SELECT Merchant_id,
SUM(SUBSTRING_INDEX(Discount_rate,':',-1)) AS discount_amount
FROM ccf_online_stage1_train
WHERE (Date !='null' AND Coupon_id !='null')
AND(LEFT(Date,4)=2016 )
AND MID(Date,5,2)=06
GROUP BY Merchant_id
) AS BaseData
GROUP BY Merchant_id
ORDER BY SUM(discount_amount) DESC
LIMIT 1
四、学习思考与总结
①MID函数
此函数为截取字符串一部分。MID(column_name,start[,length])
参数 描述
column_name 必需。要提取字符的字段。
start 必需。规定开始位置(起始值是 1)。
length 可选。要返回的字符数。如果省略,则 MID() 函数返回剩余文本。
例:str=“123456” mid(str,2,1) 结果为2
②“union all”的意思为“联合所有”,是指对两个结果集进行并集操作,包括重复行,不进行排序。
③ORDER BY SUM(discount_amount) DESC
LIMIT 1
该句代码指令可以求出倒序的第一位的数据,也就是最高的那一位
(LIMIT 1)