SQL进阶使用教程 Part Ⅰ

最近看到Datawhale出了套sql进阶教程, 链接地址:https://www.heywhale.com/home/competition/648d5f4a0f80cbce101aa718/content

随即试着进行练手,以下内容记录学习的内容和答题闯关过程(文中内容都是基于sqlite,在python环境中运行,部分函数,标识符与其他数据库如mysql等函数、用法可能会有一定冲突)本文包含其中关卡一和二,剩余关卡可移步至:

关卡一:SQL 基础速览:以 SQLite 为例

本章主要介绍sql的一般写法,内部逻辑等,内容较为基础,如果需要学习sql基础的同学,也可以移步至我之前写的文档进行学习:

【阿里云天池龙珠计划SQL训练营】SQL复习Task1_数据求学家的博客-CSDN博客

【阿里云天池龙珠计划SQL训练营】SQL复习Task2_数据求学家的博客-CSDN博客

【阿里云天池龙珠计划SQL训练营】SQL复习Task3_数据求学家的博客-CSDN博客

【阿里云天池龙珠计划SQL训练营】SQL复习Task4_数据求学家的博客-CSDN博客

【阿里云天池龙珠计划SQL训练营】SQL复习Task5_数据求学家的博客-CSDN博客

【阿里云天池龙珠计划SQL训练营】SQL复习Task6综合练习题_数据求学家的博客-CSDN博客

直接跳到答题区:

题目要求:使用泰坦尼克数据集,按照 10 岁为一个年龄段(0-9,10-19,20-29,30-39,40-49,50-59)且 60 岁以上(包括 60 岁)统一为一个年龄段(60+), 计算每个年龄段的用户的幸存率(幸存人数/总人数,使用 round 保留 2 位小数),将查询结果按照幸存率从高到低排序,你将回答如下几个问题:

Q1:最高的幸存率是多少?

Q2:最低的幸存率发生在哪一个组?

Q3:20-29 岁和 40-49 岁的乘客,哪组的幸存率更高?

Q4:没有提供年龄信息的乘客们的幸存率是多少?

SELECT 
(case 
when Age >=0 and Age <=9 then '0-9' 
when Age >=10 and Age <=19 then '10-19' 
when Age >=20 and Age <=29 then '20-29' 
when Age >=30 and Age <=39 then '30-39' 
when Age >=40 and Age <=49 then '40-49' 
when Age >=50 and Age <=59 then '50-59' 
when Age >=60  then '60' end),
count(*),
sum(Survived),
round(sum(Survived)*1.0/count(*),2)a
FROM example_table
group by 
(case 
when Age >=0 and Age <=9 then '0-9' 
when Age >=10 and Age <=19 then '10-19' 
when Age >=20 and Age <=29 then '20-29' 
when Age >=30 and Age <=39 then '30-39' 
when Age >=40 and Age <=49 then '40-49' 
when Age >=50 and Age <=59 then '50-59' 
when Age >=60  then '60' end)
order by a
LIMIT 20
[('60', 26, 7, 0.27),
 (None, 177, 52, 0.29),
 ('20-29', 220, 77, 0.35),
 ('40-49', 89, 34, 0.38),
 ('10-19', 102, 41, 0.4),
 ('50-59', 48, 20, 0.42),
 ('30-39', 167, 73, 0.44),
 ('0-9', 62, 38, 0.61)]

关卡二:复杂数据结构处理

本章主要介绍了用sql直接处理json等格式的数据,这其实在数据开发等实际工作过程中也是会遇到的。

1.学习内容

PassengerIdSurvivedpassengerInfoticketInfo
010{"Name": "Braund, Mr. Owen Harris", "Sex": "ma...{"Pclass": 3, "Ticket": "A/5 21171", "Fare": 7...
121{"Name": "Cumings, Mrs. John Bradley (Florence...{"Pclass": 1, "Ticket": "PC 17599", "Fare": 71...
231{"Name": "Heikkinen, Miss. Laina", "Sex": "fem...{"Pclass": 3, "Ticket": "STON/O2. 3101282", "F...
341{"Name": "Futrelle, Mrs. Jacques Heath (Lily M...{"Pclass": 1, "Ticket": "113803", "Fare": 53.1...
450{"Name": "Allen, Mr. William Henry", "Sex": "m...{"Pclass": 3, "Ticket": "373450", "Fare": 8.05...

1)json_extract(用于提取某个键的值)

SELECT 
    PassengerId
    ,json_extract(passengerInfo,'$.Age') -- 一个路径表达式
    
FROM example_table_json

(1, 22.0) 

 其中,$.Age’里面 '.Age' 代表我们希望查询这个 JSON 字符串当中名为 'Age' 的字段值

2)json_patch(可以将两个 JSON 字段进行整合,类似于 Python 当中合并两个字典)

SELECT 
    PassengerId
    ,passengerInfo 
    ,ticketInfo 
    ,json_patch(passengerInfo,ticketInfo)   -- 等价于 passengerInfo + ticketInfo
FROM example_table_json
(1,
 '{"Name": "Braund, Mr. Owen Harris", "Sex": "male", "Age": 22.0, "SibSp": 1, "Parch": 0}',
 '{"Pclass": 3, "Ticket": "A/5 21171", "Fare": 7.25, "Cabin": null, "Embarked": "S"}',
 '{"Name":"Braund, Mr. Owen Harris","Sex":"male","Age":22.0,"SibSp":1,"Parch":0,"Pclass":3,"Ticket":"A/5 21171","Fare":7.25,"Embarked":"S"}')

3) json_remove(从指定的 JSON 字段中去掉某些特定的字段,可以用于反选)

 json_remove(passengerInfo,'$.Age','$.Sex')  

4)json_insert(在 JSON 当中插入新的键值对)

    ,json_insert(ticketInfo,'$.Name',json_extract(passengerInfo,'$.Name'))   -- 将 passengerInfo 当中的 Name 字段插入
    ,json_insert(ticketInfo,'$.Ticket','长期饭票')   -- 已经存在的字段将不会被更新

5)json_replace(在 JSON 当中替换匹配的键所对应的值)

    ,json_replace(ticketInfo,'$.Name',json_extract(passengerInfo,'$.Name'))   -- 不匹配的值将不会被更新
    ,json_replace(ticketInfo,'$.Ticket','长期饭票')   -- 匹配的值会被更新

 6)json_set(json_set() = json_replace() + json_insert。即对于匹配的值进行替换,对于不匹配的值将进行插入。)

    ,json_set(ticketInfo,'$.Name',json_extract(passengerInfo,'$.Name'))   -- 不匹配的值将会做插入
    ,json_set(ticketInfo,'$.Ticket','长期饭票')   -- 匹配的值会被更新
(1,
 '{"Name": "Braund, Mr. Owen Harris", "Sex": "male", "Age": 22.0, "SibSp": 1, "Parch": 0}',
 '{"Pclass": 3, "Ticket": "A/5 21171", "Fare": 7.25, "Cabin": null, "Embarked": "S"}',
 '{"Pclass":3,"Ticket":"A/5 21171","Fare":7.25,"Cabin":null,"Embarked":"S","Name":"Braund, Mr. Owen Harris"}',
 '{"Pclass":3,"Ticket":"长期饭票","Fare":7.25,"Cabin":null,"Embarked":"S"}')

 7)数组的表示

'$.a[2]' 代表获取 a 字段下的数组中的第 3 个数值,例如 {"a":[1,3,5,7],"b":1} 将会返回数字 5,下方代码中json_group_array将普通数据变为了数组

SELECT 
    json_extract(passengerInfo,'$.Age') AS Age
    ,json_extract(json_group_array(PassengerId),'$[0]')        -- 注意这里没有 . 了 
    ,json_extract(json_group_array(PassengerId),'$[0]','$[3]') -- 查询数组中的多个值,如果数组长度不够则自动返回 NULL
FROM example_table_json
WHERE Survived==1 AND json_extract(passengerInfo,'$.Age') IS NOT NULL
      AND Age>40

GROUP BY json_extract(passengerInfo,'$.Age') 

 

[(41.0, 273, '[273,null]'),
 (42.0, 289, '[289,622]'),
 (43.0, 780, '[780,null]'),
 (44.0, 195, '[195,null]'),
 (45.0, 188, '[188,707]'),
 (47.0, 872, '[872,null]'),
 (48.0, 461, '[461,713]'),
 (49.0, 53, '[53,797]'),
 (50.0, 260, '[260,527]'),
 (51.0, 766, '[766,null]'),
 (52.0, 450, '[450,null]'),
 (53.0, 572, '[572,null]'),
 (54.0, 497, '[497,null]'),
 (55.0, 16, '[16,null]'),
 (56.0, 648, '[648,null]'),
 (58.0, 12, '[12,null]'),
 (60.0, 367, '[367,null]'),
 (62.0, 571, '[571,null]'),
 (63.0, 276, '[276,null]'),
 (80.0, 631, '[631,null]')]
json_extract(json_group_array(PassengerId),'$[#-1]')  

其中的[#-1]代表数组中的最后一个数 ,# 符号代表的是当前数组的长度

8)将普通数据变为json格式

# 这里我们使用的是 example_table 这张表,是不含 JSON 字符串格式数据的
# 通过 json_object 函数构造 json 数据结构
# 为了节省空间,我们这里只构造 passengerInfo
SELECT 
    PassengerId
    ,Survived
    /*
    json_object 的输入格式:json_object(字段名A,字段值A,字段名B,字段值B...)
    */
    ,json_object('Sex',Sex,'Age',Age,'SibSp',SibSp,'Parch',Parch) AS passengerInfo 
FROM example_table
(1, 0, '{"Sex":"male","Age":22.0,"SibSp":1,"Parch":0}')

2.答题

Q1:查询 example_table_json 中的 passengerInfo 字段时,下列哪个操作一定会改变字段值

 A . json_extract(passengerInfo,'$.Sex')

 B . json_insert(passengerInfo,'$.Sex','unknown')

 C . json_replace(passengerInfo,'$.PassengerName',json_extract(passengerInfo,'$.Name'))

 D . json_set(passengerInfo,'$.Age','2002')

Q2:参考 1)-7)中的代码,按照 ticketInfo 中的 PClass 分组并查看每个组内的幸存情况。请问下面哪个判断是正确的:

 A . Pclass 为 1的分组中的前三名乘客全部生还

 B . Pclass 为 2 的分组中,前 20 名乘客的生还率超过了 50%

 C . Pclass 为 3 的分组中,前 5 名乘客无一生还

 D . 三个分组中各自的 第 10 名乘客合计 2人生还

Q3:参考之前的学习内容,按照 passengerInfo 中的 Age 分组后(去掉 Age 为 None 的组),查看每个组的生存情况,将每个组的倒数第三位的数字设置为 -1 。请问最后一共加入了多少 -1

答案:

1.D

2.B

SELECT json_extract(ticketinfo,'$.Pclass') AS Pclass
       ,json_group_array(survived)
       ,json_extract(json_group_array(survived),'$[9]')
FROM example_table_json


GROUP BY json_extract(ticketinfo,'$.Pclass') 

'''
[(1,
  '[1,1,0,1,1,0,0,1,0,0,1,0,1,1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,0,1,1,0,0,1,0,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,0,1,0,1,1,1,1,1,1,0,1,0,1,1,1,0,1,0,1,1,0,0,0,1,1,0,0,0,1,0,1,0,1,1,1,0,1,1,0,0,1,1,1,0,0,1,0,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,0,1,0,1,0,1,1,1,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,0,1,0,0,1,1,1,0,1,1,0,0,0,1,1,0,1,0,1,0,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1]',
  0),
 (2,
  '[1,1,1,0,1,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,1,1,0,1,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,1,0,0,0,1,1,0,0,0,1,1,0,0,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,1,0,0,1,1,0,0,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,1,0,0,0,1,1,1,1,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,0,1,1,1,1,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,1,1,1,1,0,0]',
  1),
 (3,
  '[0,1,0,0,0,1,1,0,0,0,0,0,1,1,0,1,0,1,0,1,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,1,1,1,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,0,0,1,1,0,0,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0]',
  0)]
'''

3.56

SELECT COUNT(DISTINCT AGE)
FROM 
(SELECT json_extract(passengerInfo,'$.Age') AS AGE
       ,json_extract(json_group_array(survived),'$[#-3]') A 
FROM example_table_json
WHERE json_extract(passengerInfo,'$.Age')  IS NOT NULL
GROUP BY AGE)
WHERE A !=-1

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数据求学家

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值