一、请使用A股上市公司季度营收预测数据集《Income Statement.xls》和《Company Operating.xlsx》和《Market Data.xlsx》,以Market Data为主表,将三张表中的TICKER_SYMBOL为600383和600048的信息合并在一起。只需要显示以下字段。
表名 | 字段名 |
---|---|
Income Statement | TICKER_SYMBOL |
Income Statement | END_DATE |
Income Statement | T_REVENUE |
Income Statement | T_COGS |
Income Statement | N_INCOME |
Market Data | TICKER_SYMBOL |
Market Data | END_DATE_ |
Market Data | CLOSE_PRICE |
Company Operating | TICKER_SYMBOL |
Company Operating | INDIC_NAME_EN |
Company Operating | END_DATE |
Company Operating | VALUE |
答案:
SELECT A.`TICKER_SYMBOL`,A.`END_DATE`,A.`CLOSE_PRICE`,
B.`T_REVENUE`,B.`T_COGS`,B.`N_INCOME`,
C.`INDIC_NAME_EN`,C.`VALUE`
FROM `market data` AS A
INNER JOIN `income statement` AS B ON A.`TICKER_SYMBOL`= B.`TICKER_SYMBOL` AND B.`END_DATE`=A.`END_DATE`
INNER JOIN `company operating` AS C ON C.`TICKER_SYMBOL`= A.`TICKER_SYMBOL` AND C.`END_DATE`=A.`END_DATE`
WHERE A.`TICKER_SYMBOL` IN('600383', '600048')
二、请使用 Wine Quality Data 数据集《winequality-red.csv》,找出 pH=3.03的所有红葡萄酒,然后,对其 citric acid 进行中式排名(相同排名的下一个名次应该是下一个连续的整数值。换句话说,名次之间不应该有“间隔”)
SELECT PH,`citric acid`,DENSE_RANK() OVER
(ORDER BY `citric acid`) DENSE
FROM `winequality-red`
WHERE `pH`= '3.03'
三、 使用Coupon Usage Data for O2O中的数据集《ccf_offline_stage1_test_revised.csv》,试分别找出在2016年7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。只考虑满减的金额,不考虑打几折的优惠券。
#发放优惠券总金额最多
select `Merchant_id` , COUNT(*) as num FROM `ccf_offline_stage1_test_revised`
WHERE `Date_received` BETWEEN '2016-07-01' AND '2016-07-31'
group by `Merchant_id`
ORDER BY num desc LIMIT 1;
#发放优惠券张数最多的商家
SELECT `Merchant_id` , SUM( CAST(SUBSTRING_INDEX(`Discount_rate`, ':', -1) AS SIGNED)) as Discount_value FROM `ccf_offline_stage1_test_revised`
WHERE `Date_received` BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY `Merchant_id` ORDER BY Discount_value DESC
LIMIT 1;
四、请使用A股上市公司季度营收预测中的数据集《Macro&Industry.xlsx》中的sheet-INDIC_DATA,请计算全社会用电量:第一产业:当月值在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?
#当月值在2015年用电最高峰
SELECT period_date ,MAX(data_value) MAX_value from `macro industry` WHERE YEAR(period_date)=2015 AND INDIC_ID='2020101522'
GROUP BY period_date
order by MAX_value desc
LIMIT 1;
#相比去年同期增长/减少了多少个百分比
SELECT A.month,A.MAX_value NOWYEAR,B.MAX_value OLDYEAR, ((A.MAX_value-B.MAX_value)/B.MAX_value) CURRENT_VALUE FROM
(SELECT MONTH(period_date) month ,MAX(data_value) MAX_value from `macro industry` WHERE YEAR(period_date)=2015 AND INDIC_ID='2020101522'
GROUP BY period_date
order by MAX_value desc
LIMIT 1) as a
LEFT JOIN
(SELECT MONTH(period_date) month ,MAX(data_value) MAX_value from `macro industry` WHERE YEAR(period_date)=2014 AND INDIC_ID='2020101522'
GROUP BY period_date
order by MAX_value desc
LIMIT 1) as b
on a.month=b.month
五、
使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》,试统计在2016年6月期间,线上总体优惠券弃用率为多少?并找出优惠券弃用率最高的商家。
弃用率 = 被领券但未使用的优惠券张数 / 总的被领取优惠券张数
#线上总体优惠券弃用率
SELECT SUM(CASE WHEN `Date` IS null
AND `Merchant_id` IS NOT null THEN 1 ELSE 0 END) /SUM(CASE WHEN `Merchant_id` IS NOT null THEN 1 ELSE 0 END) RATE
FROM `ccf_online_stage1_train`
WHERE `Date_received` BETWEEN '2016-06-01'
and '2016-06-30'
#优惠券弃用率最高的商家
SELECT `Merchant_id` , SUM(CASE WHEN `Date` IS null
AND `Coupon_id` IS NOT null THEN 1 ELSE 0 END) /SUM(CASE WHEN `Coupon_id` IS NOT null THEN 1 ELSE 0 END) RATED
FROM `ccf_online_stage1_train`
WHERE `Date_received` BETWEEN '2016-06-01' AND '2016-06-30'
GROUP BY `Merchant_id`
ORDER BY RATED DESC
LIMIT 1