Task01a:复习SQL的基本知识

需要的工具及基础学习内容

- 工具: MySQL【先过一遍书,代码准确性具体还得二期编辑,希望用MySQL】

目标

  • 复习SQL的基本知识

SQL的基本知识

  1. 查询数据。我们考虑的是从数据库中获取信息以解决业务问题,而不是将数据写入数据库。SQL语言中用于分析的部分是SELECT语句,描述了做什么来得到结果集。
  2. 数据结构。关系型数据库储存结构化的数据——定义好行和列的表
  3. 使用数据流处理数据。数据转换的图示方法。
  4. SQL自身的语法。

1. 计数与分组计数

在这里插入图片描述

-- 计数: 查询邮政编码总数
SELECT COUNT(*) AS numzip
FROM ZipCensus zc;

-- 分组计数:查询每一个州中邮政编码的个数
SELECT stab,COUNT(*) AS numzip
FROM ZipCensus zc
GROUP BY stab
ORDER BY numzip DESC;

-- 计数(去重):查询每个州中邮政编码的去重个数
SELECT zc.stab,COUNT(DISTINCT zc.zcta5) AS num
FROM ZipCensus zc
GROUP BY zc.stab
ORDER BY zc.stab;

2. UNION ALL

-- 将ZipCensus表中的地址名提取出来,并存储在一个拥有统一数据类型的单独列中
SELECT u.location,u.locationtype
FROM (
SELECT DISTINCT stab AS location,'state' AS locationtype FROM Zipcensus zc
UNION ALL
SELECT DISTINCT county AS location,'county' AS locationtype FROM Zipcensus zc
UNION ALL
SELECT DISTINCT zipname AS location,'zipname' AS locationtype FROM Zipcensus zc) u

3. CASE

CASE常用的一种情况是创建指示变量。

-- 分组计数 在每个州中,有多少邮政编码对应的人口超过10000人,且这些邮政编码对应的人口总数是多少
SELECT zc.stab,COUNT(*) AS numbigzip,SUM(totpop) AS popbigzip
FROM ZipCensus zc
WHERE totpop > 10000
GROUP BY zc.stab;

-- 条件分组计数 在每个州中,有多少邮政编码对应的人口超过10000人,又有多少超过1000,而且这些数据集的总人口数分别是多少?
SELECT zc.stab,SUM(CASE WHEN totpop >10000 THEN 1 ELSE 0 END) AS num_10000,
SUM(CASE WHEN totpop > 1000 THEN 1 ELSE 0 END) AS num_1000,
SUM(CASE WHEN totpop > 10000 THEN totpop ELSE 0 END) AS pop_10000,
SUM(CASE WHEN totpop>1000 THEN totpop ELSE 0 END) AS pop_1000
FROM Zipcensus zc
GROUP BY zc.stab;
-- CASE指示变量 根据人口总数超过10000和1000汇总邮政编码
SELECT zc.stab,
    SUM(is_pop_10000) AS num_10000,
    SUM(is_pop_1000) AS num_1000,
    SUM(is_pop_10000 * totpop) AS pop_10000,
    SUM(is_pop_1000 * totpop) AS pop_1000
FROM (SELECT zx.*,
        (CASE WHEN totpop>10000 THEN 1 ELSE 0 END) AS is_pop_10000,
        (CASE WHEN totpop>1000 THEN 1 ELSE 0 END) AS is_pop_1000
    FROM Zipcensus zc) zc
GROUP BY zc.stab;

-- CTE
WITH zc as (SELECT zx.*,
        (CASE WHEN totpop>10000 THEN 1 ELSE 0 END) AS is_pop_10000,
        (CASE WHEN totpop>1000 THEN 1 ELSE 0 END) AS is_pop_1000
    FROM Zipcensus zc)
SELECT zc.stab,
    SUM(is_pop_10000) AS num_10000,
    SUM(is_pop_1000) AS num_1000,
    SUM(is_pop_10000 * totpop) AS pop_10000,
    SUM(is_pop_1000 * totpop) AS pop_1000
FROM zc
GROUP BY zc.stab;

4. 窗口函数

-- 想得到每个邮政编码都返回州人口的总数
SELECET zc.zcta5,
    SUM(totpop) OVER (PARTITION BY zc.stab) AS stpop,
    ROW_NUMBER() OVER (PARTITION BY zc.stab ORDER BY totpop DESC) AS ZipPopRank
FROM Zipcensus zc;
5. 子查询与联接
5.1 表的联接

无论何时,在做表的联接时,问自己下列两个问题:

  1. 因为另一个表没有匹配的数据,其中一个表的数据会意外丢失吗?
  2. 因为表之间有多个匹配出现,进而导致结果集中有意外存在的冗余数据吗?
    答案要求建立在对底层数据的理解上。
5.1.1 查询联接
5.1.2 等值联接与非等值联接

内联接,只有匹配的数据行才会包含在结果集中。
在这里插入图片描述

-- 等值联接 对于每个邮政编码来说,同一个州中有多少邮政编码包含更大的人口数量?
SELECT zc1.zcta5,SUM(CASE WHEN zc1.totpop < zc2.totpop THEN 1 ELSE 0 END) AS numzip
FROM Zipcensus zc1 JOIN Zipcensus zc2
ON zc1.stab = zc2.stab
GROUP BY zc1.zcta5;

-- 非等值联接 多少订单高于客户居住的平均租金?
SELECT zc.stab,COUNT(*) AS numrows
FROM Orders o JOIN ZipCensus zc
ON o.zipcode = zc.zcta5 AND o.totalprice > zc.mediangrossrent
GROUP BY zc.stab;
5.1.3 外联接

外联接确保两个表的数据同时保留在结果集中,即使第二个表中没有匹配行。包括LEFT JOIN、RIGHT JOIN。确保主表的所有数据行都保留在结果集中,其中LEFT/RIGHT指定主表。

SELECT zc.stab,COUNT(*) AS numrows
FROM Orders o LEFT JOIN ZipCensus zc
ON o.ZipCode = zc.zcta5 AND o.TotalPrice > zc.mediangrossrent
GROUP BY zc.stab;
5.2 子查询
5.2.1 处理统计信息的子查询

在这里插入图片描述

--  在每一个州中,有多少邮政编码对应的人口密度比该州所有邮政编码对应的平均人口密度都高
SELECT zc.stab,SUM(CASE WHEN popdensity>avgpopdensity THEN 1 ELSE 0 END) AS numdenser
FROM (SELECT zc.stab,AVG(totpop/landsqmi) AS avgpopdensity FROM Zipcensus zc WHERE zc.landsqmi>0 GROUP BY zc.stab) AS zcavg JOIN
(SELECT stab,totpop/landsqmi AS popdensity FROM Zipcensus zc WHERE zc.landsqmi>0) AS zc
ON zc.stab = zcavg.stab
GROUP BY zc.stab;
5.2.2 子查询+IN

在这里插入图片描述

-- 子查询+IN 少于100个邮政编码的州辖区内所有邮政编码的列表
SELECT zc.zcta5,zc.stab
FROM Zipcensus zc
WHERE zc.stab IN (SELECT stab FROM Zipcensus GROUP BY stab HAVING COUNT(1)<100);

-- JOIN 
SELECT zc.*
FROM Zipcensus zc JOIN
    (SELECT stab,COUNT(1) AS numstates FROM Zipcensus GROUP BY stab HAVING COUNT(1)<100) zipstate
ON zc.stab = zipstate.stab;
5.2.3 NOT IN

在这里插入图片描述

-- Orders表中哪些邮政编码不存在于Zipcensus中
SELECT o.Zipcode,COUNT(1) AS numorders
FROM Orders AS o 
WHERE o.Zipcode NOT IN (SELECT zc.zcta5 FROM Zipcensus AS zc)
GROUP BY o.Zipcode;

-- JOIN
SELECT o.Zipcode,COUNT(1) AS numorders
FROM Orders AS o LEFT JOIN Zipcensus AS zc
ON o.Zipcode = zc.zcta5
WHERE zc.zcta5 IS NULL
GROUP BY o.Zipcode
ORDER BY Numorders DESC;
5.2.4 关联子查询

在这里插入图片描述

-- 每一个州中,哪个邮政编码对应的人口最大?是多少?
SELECT zc.stab,zc.zcta5,zc.totpop
FROM Zipcensus zc
WHERE zc.totpop = (SELECT MAX(zcinner.totpop) FROM Zipcensus AS zcinner WHERE zcinner.stab = zc.stab)
ORDER BY zc.stab; 

-- JOIN
SELECT zc.stab,zc.zcta5,zc.totpop
FROM Zipcensus AS zc JOIN
    (SELECT zc.stab,MAX(zc.totpop) AS maxpop FROM Zipcensus AS zc GROUP BY stab) zipmax
ON zc.stab = zipmax.stab AND zc.totpop = zipmax.maxpop
ORDER BY zc.stab;

预告

掌握统计学的核心概念及SQL实现。具体内容包括:计数、分组求和、描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值