这是我自己学习SQL的笔记,教程是来自B站的视频,主讲人是在Youtube非常火的MOSH
1.1 What is SQL
A DATABASE is a collection of data stored in a format that can easily be accessed.
1.2 TYPE OF DATABASE
Relational database & No-SQL
Relational Database Management System: MySQL, SQL Server, Oracle
No-SQL: no relationship between tables
NoSQL system don’t understand SQL
2.1 Create the databases
-- USE sql_store; 使用哪个数据库
SELECT * FROM customers
WHERE customer_id =1
ORDER BY first_name;
2.2 THE SELECT Statement
SELECT
last_name,
first_name,
points,
(points + 10) * 100 AS discount_factor
-- 遵循数学表达式的优先性
-- AS "discount facotr" 如果字母间有空格必须加上""
FROM customers;
2.3 THE SELECT CLAUSE
SELECT DISTINCT state
-- DISTINCT 表示在state中不显示重复的
FROM customers;
EXERCISE
SELECT
name,
unit_price,
unit_price * 1.1 AS new_price
FROM customers;
2.4 THE WHERE CLAUSE
SELECT *
FROM customers
WHERE birth_date > '1990-01-01';
-- state!= 'VA'
-- points > 3000
-- </>=/<=/=/!=
EXERCISE
-- 查看今年的订单
SELECT *
FROM orders
WHERE order_date >= '2019-01-01';
2.5 THE AND/OR/NOT
SELECT *
FROM customers
-- WHERE birth_date > '1990-01-01' OR (points > 1000 AND state = 'VA')
-- OR points > 1000
WHERE NOT (birth_date > '1990-01-01' OR points > 1000);
EXERCISE
-- exercise
SELECT *
FROM order_items
WHERE order_id = 6 AND (unit_price * quantity) >30;
2.6 THE IN OPERATOR
SELECT *
FROM customers
-- 在SQL语言中我们不能将字符穿和布尔表达式进行结合
-- OR 'GA' OR 'FL'
WHERE state = 'VA' OR state = 'GA' OR state = 'FL';
-- WHERE state NOT IN ('VA','FL','GA')
EXERCISE
-- EXERCISE
SELECT *
FROM products
WHERE quantity_in_stock IN (49,38,72);
2.7 THE BETWEEN OPERATOR
SELECT *
FROM customers
-- WHERE points>= 1000 AND points<= 3000
-- 用between语法来表示
WHERE points BETWEEN 1000 AND 3000;
EXERCISE
SELECT *
FROM customers
WHERE birth_date BETWEEN '1990-01-01'AND '2000-01-01';
2.8 THE LIKE OPERATOR
SELECT *
FROM customers
WHERE last_name LIKE -- "%b%" -- %来表示任意字符
-- 名字中有
"_____y" -- 名字中第_____个字符是y
EXERCISE
-- exercise address匹配
SELECT *
FROM customers
WHERE address LIKE '%TRAIL%' -- BTWEEN IN 是精准定位而like表示模糊定位
OR address LIKE '%avenue%'
-- EXERCISE
SELECT *
FROM customers
WHERE phone LIKE '%9';
2.9 THE REGEXP OPERATOR
SELECT *
FROM customers
WHERE last_name LIKE '%field%';
SELECT *
FROM customers
WHERE last_name REGEXP 'field$|mac|rose';
– ^表示在字符串开头
– $ 表示在字符串结尾
– | 可以同时匹配两个字符串,是或的意思
SELECT *
FROM customers
WHERE last_name REGEXP '[gim]e';
Output: ge/ie/me
SELECT *
FROM customers
WHERE last_name REGEXP '[a-h]e';
Output: ae/be/…/he
EXERCISE
FIRSTNAME中带ELKA或AMBUR
-- EXERCISE REGEXP1
SELECT *
FROM customers
WHERE FIRST_name REGEXP 'ELKA|AMBUR';
LASTNAME中以EY或ON结尾
-- EXERCISE REGEXP2
SELECT *
FROM customers
WHERE last_name REGEXP 'EY$|ON$';
lastname中以my开头并且里面带se
-- EXERCISE REGEXP3
SELECT *
FROM customers
WHERE last_name REGEXP '^MY|SE';
名字中带BR或BU
-- EXERCISE REGEXP4
SELECT *
FROM customers
WHERE last_name REGEXP 'B[RU]';
正则表达式
模式 | 描述 |
---|---|
^ | 匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性,^ 也匹配 ‘\n’ 或 ‘\r’ 之后的位置。 |
$ | 匹配输入字符串的结束位置。如果设置了RegExp 对象的 Multiline 属性,$ 也匹配 ‘\n’ 或 ‘\r’ 之前的位置。 |
. | 匹配除 “\n” 之外的任何单个字符。要匹配包括 ‘\n’ 在内的任何字符,请使用像 ‘[.\n]’ 的模式。 |
[…] | 字符集合。匹配所包含的任意一个字符。例如, ‘[abc]’ 可以匹配 “plain” 中的 ‘a’。 |
[^…] | 负值字符集合。匹配未包含的任意字符。例如, ‘[^abc]’ 可以匹配 “plain” 中的’p’。 |
p1|p2|p3 | 匹配 p1 或 p2 或 p3。例如,‘z|food’ 能匹配 “z” 或 “food”。’(z|f)ood’ 则匹配 “zood” 或 “food”。 |
* | 匹配前面的子表达式零次或多次。例如,zo* 能匹配 “z” 以及 “zoo”。* 等价于{0,}。 |
+ | 匹配前面的子表达式一次或多次。例如,‘zo+’ 能匹配 “zo” 以及 “zoo”,但不能匹配 “z”。+ 等价于 {1,}。 |
{n} | n 是一个非负整数。匹配确定的 n 次。例如,‘o{2}’ 不能匹配 “Bob” 中的 ‘o’,但是能匹配 “food” 中的两个 o。 |
{n,m} | m 和 n 均为非负整数,其中n <= m。最少匹配 n 次且最多匹配 m 次。 |
\ | 将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。例如, ‘n’ 匹配字符 ‘n’。’\n’ 匹配换行符。序列 ‘\’ 匹配 “”,而 ‘(’ 则匹配 “(”。 |
2.10 THE IS NULL OPERATOR
SELECT *
FROM customers
WHERE phone IS NOT NULL;
EXERCISE
-- EXERCISE
SELECT *
FROM orders
WHERE shipped_date IS NULL;
2.11 THE ORDER BY CLAUSE
MYSQL中可以用任何列排列数据,不管那列是不是在select子句中
SELECT first_name, last_name
FROM customers
ORDER BY birth_date;
排序可以根据自己起的列排序
SELECT first_name, last_name, 10 AS points
FROM customers
ORDER BY points, first_name;
1–represents first_name
2–represents last_name
最好避免使用
SELECT first_name, last_name, 10 AS points
FROM customers
ORDER BY 1,2;
exercise
order_id=2 并且根据每一项目的总价格进行排序
SELECT *
FROM order_items
WHERE order_id =2
ORDER BY quantity * unit_price DESC;
SELECT
order_id,
product_id,
quantity,
unit_price,
(quantity * unit_price) AS Total_price
FROM order_items
WHERE order_id =2
ORDER BY Total_price DESC;
2.12 THE LIMIT CLAUSE
SELECT *
FROM customers
LIMIT 3
OFFSET
– page 1: 1-3
–page 2: 4-6
–page 3: 7-9
如果想直接获取第三页的数据,需要跳过前面的6条记录并选择3条记录
SELECT *
FROM customers
LIMIT 6,3;
LIMIT 6,3 --跳过前面的6条记录,并选择6条记录后的3条记录
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MWpalSSO-1610167958984)(C:\Users\Changnie\AppData\Roaming\Typora\typora-user-images\image-20201213173050617.png)]
EXERCISE
查找积分最多的前三名用户
-- Get the top three loyal customers
SELECT *
FROM customers
ORDER BY points DESC
LIMIT 3;
3.1 INNER JOINS
SELECT *
FROM orders
INNER JOIN customers
ON customers.customer_id = orders.customer_id;
SELECT order_id, first_name, last_name
FROM orders
INNER JOIN customers
ON customers.customer_id = orders.customer_id;
Error Code: 1052. Column ‘customer_id’ in field list is ambiguous 0.000 sec
SELECT order_id, customer_id, first_name, last_name
FROM orders
INNER JOIN customers
ON customers.customer_id = orders.customer_id;
Reason: orders & customers表格都有customer_id 这个列,MySQL无法确定我们想从哪张表选取这列
How to solve it?
SELECT order_id, orders.customer_id, first_name, last_name
FROM orders
INNER JOIN customers
ON customers.customer_id = orders.customer_id;
在customer_id前面标注是哪个表中的columns
用o代替orders,用c代替customers
SELECT order_id, o.customer_id, first_name, last_name
FROM orders o
INNER JOIN customers c
ON c.customer_id = o.customer_id;
EXERCISE
SELECT order_id,p.product_id,name,quantity, o.unit_price
FROM order_items o
JOIN products p
ON o.product_id = p.product_id;
Output
3.2 Joining Across Databases
将分散在多个数据库的表中的列合并起来
尝试将sql_inventory中的products与sql_store order_items链接到一起
SELECT *
FROM order_items o
JOIN sql_inventory.products p
ON o.product_id = p.product_id;
因为是USE sql_store所以sql_inventory.products p
Another Situation(USE sql_inventory)
USE sql_inventory;
SELECT *
FROM products p
JOIN sql_store.order_items oi
ON p.product_id= oi.product_id;
3.3 Self Joins
使表自己和自己进行连结,查看每个员工和他的manager
USE sql_hr;
SELECT *
FROM employees e
JOIN employees m -- m for manager
ON e.reports_to = m.employee_id;
Output
employee_id | first_name | last_name | job_title | salary | reports_to | office_id | employee_id | first_name | last_name | job_title | salary | reports_to | office_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
33391 | D’arcy | Nortunen | Account Executive | 62871 | 37270 | 1 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
37851 | Sayer | Matterson | Statistician III | 98926 | 37270 | 1 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
40448 | Mindy | Crissil | Staff Scientist | 94860 | 37270 | 1 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
56274 | Keriann | Alloisi | VP Marketing | 110150 | 37270 | 1 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
63196 | Alaster | Scutchin | Assistant Professor | 32179 | 37270 | 2 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
67009 | North | de Clerc | VP Product Management | 114257 | 37270 | 2 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
67370 | Elladine | Rising | Social Worker | 96767 | 37270 | 2 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
68249 | Nisse | Voysey | Financial Advisor | 52832 | 37270 | 2 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
72540 | Guthrey | Iacopetti | Office Assistant I | 117690 | 37270 | 3 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
72913 | Kass | Hefferan | Computer Systems Analyst IV | 96401 | 37270 | 3 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
75900 | Virge | Goodrum | Information Systems Manager | 54578 | 37270 | 3 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
76196 | Mirilla | Janowski | Cost Accountant | 119241 | 37270 | 3 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
80529 | Lynde | Aronson | Junior Executive | 77182 | 37270 | 4 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
80679 | Mildrid | Sokale | Geologist II | 67987 | 37270 | 4 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
84791 | Hazel | Tarbert | General Manager | 93760 | 37270 | 4 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
95213 | Cole | Kesterton | Pharmacist | 86119 | 37270 | 4 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
96513 | Theresa | Binney | Food Chemist | 47354 | 37270 | 5 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
98374 | Estrellita | Daleman | Staff Accountant IV | 70187 | 37270 | 5 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
115357 | Ivy | Fearey | Structural Engineer | 92710 | 37270 | 5 | 37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 |
在已连接的表中选取特定列
SELECT
e.employee_id,
e.first_name,
m.first_name AS manager
FROM employees e
JOIN employees m -- m for manager
ON e.reports_to = m.employee_id;
3.4 Joining Multiple Tables
对多张表进行🔗
USE sql_store;
SELECT
o.order_id,
o.order_date,
c.first_name,
c.last_name,
os.name AS status
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
JOIN order_statuses os
ON o.status = os.order_status_id;
Output
EXERCISE
USE sql_invoicing;
SELECT
p.date,
p.amount,
c.name,
c.address,
c.city,
c.phone,
pm.name as payment_method
FROM payments p
JOIN clients c
ON p.client_id = c.client_id
JOIN payment_methods pm
ON p.payment_method = pm.payment_method_id;
Output
3.5 Compound Join Conditions
使用两个都含有重复值的columns一起唯一识别每一个订单项目
在表设计过程中,order_id & product_id被设计成复合主键
USE sql_store;
SELECT *
FROM order_items oi
JOIN order_item_notes oin
ON oi.order_id = oin.order_id
AND oi.product_id = oin.product_id;
3.6 隐式连接语法Implicit Join Syntax
SELECT *
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id;
-- IMPLICIT JOIN SYNTAX(NOT SUGGEST)
SELECT *
FROM orders o,customers c
WHERE o.customer_id = c.customer_id;
尽量使用显示连接法
3.7 Outer Joins
LEFT JOIN
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id;
当使用left join时,所有左表的记录都会被返回,不管条件正确还是错误
Output
RIGH JOIN
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM customers c
RIGHT JOIN orders o
ON c.customer_id = o.customer_id
ORDER BY c.customer_id;
Output
DIFFERENCE BETWEEN LEFT JOIN & RIGHT JOIN
LEFT JOIN: 以左表为目标进行索引
RIGHT JOIN: 以右表为目标进行索引
EXERCISE
对product和order进行外连接,要求:即使product_id = 7美欧quantity也要返回查询
SELECT
p.product_id,
p.name,
oi.quantity
FROM products p
LEFT JOIN order_items oi
ON p.product_id = oi.product_id;
3.8 OUTER JOIN BETWEEN MULTIPLE TABLES
多张表的外连接
SELECT
c.customer_id,
c.first_name,
o.order_id,
sh.name AS shipper
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
LEFT JOIN shippers sh
ON o.shipper_id = sh.shipper_id
ORDER BY c.customer_id;
customers
orders
shipper
Output
EXERCISE
因为customers和orders只有customer_id&first_name两列相同,所以可以INNER JOIN
SELECT
o.order_date,
o.order_id,
c.first_name,
sh.name AS shipper,
os.order_status_id AS status
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
LEFT JOIN shippers sh
ON o.shipper_id = sh.shipper_id
LEFT JOIN order_statuses os
ON sh.shipper_id = os.order_status_id;
3.9 Self Outer Joins
表子链接:要求能看到他们的manager同时也显示manager
employee_id | first_name | last_name | job_title | salary | reports_to | office_id |
---|---|---|---|---|---|---|
33391 | D’arcy | Nortunen | Account Executive | 62871 | 37270 | 1 |
37270 | Yovonnda | Magrannell | Executive Secretary | 63996 | 10 | |
37851 | Sayer | Matterson | Statistician III | 98926 | 37270 | 1 |
40448 | Mindy | Crissil | Staff Scientist | 94860 | 37270 | 1 |
56274 | Keriann | Alloisi | VP Marketing | 110150 | 37270 | 1 |
63196 | Alaster | Scutchin | Assistant Professor | 32179 | 37270 | 2 |
67009 | North | de Clerc | VP Product Management | 114257 | 37270 | 2 |
67370 | Elladine | Rising | Social Worker | 96767 | 37270 | 2 |
68249 | Nisse | Voysey | Financial Advisor | 52832 | 37270 | 2 |
72540 | Guthrey | Iacopetti | Office Assistant I | 117690 | 37270 | 3 |
72913 | Kass | Hefferan | Computer Systems Analyst IV | 96401 | 37270 | 3 |
75900 | Virge | Goodrum | Information Systems Manager | 54578 | 37270 | 3 |
76196 | Mirilla | Janowski | Cost Accountant | 119241 | 37270 | 3 |
80529 | Lynde | Aronson | Junior Executive | 77182 | 37270 | 4 |
80679 | Mildrid | Sokale | Geologist II | 67987 | 37270 | 4 |
84791 | Hazel | Tarbert | General Manager | 93760 | 37270 | 4 |
95213 | Cole | Kesterton | Pharmacist | 86119 | 37270 | 4 |
96513 | Theresa | Binney | Food Chemist | 47354 | 37270 | 5 |
98374 | Estrellita | Daleman | Staff Accountant IV | 70187 | 37270 | 5 |
115357 | Ivy | Fearey | Structural Engineer | 92710 | 37270 | 5 |
SELECT
e.employee_id,
e.first_name,
m.first_name AS manager
FROM employees e
LEFT JOIN employees m
ON e.reports_to = m.employee_id;
Output
3.10 The Using Clause
如果两个内连接的表中,有列名称是完全一样的,可以用一个更简洁的USING子句替换ON子句
SELECT
o.order_id,
c.first_name
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id;
SELECT
o.order_id,
c.first_name,
sh.name AS shipper
FROM orders o
JOIN customers c
USING (customer_id)
USING key word只能在不同表中的列名字完全一样的情况下使用
双主键表格
SELECT *
FROM order_items oi
JOIN order_item_notes oin
on oi.order_id = oin.order_id
AND oi.product_id = oin.product_id;
USING语句优化后
SELECT *
FROM order_items oi
JOIN order_item_notes oin
USING (order_id,product_id);
EXERCISE
SELECT
p.date,
c.name AS client,
p.amount,
pm.name AS payment_method
FROM clients c
JOIN payments p
USING client_id
JOIN payment_methods pm
on p.payment_method = pm.payment_method_id;
3.11 Natural Joins
使用自然连接,不具体打上列名,数据引擎会自动基于共同的列连接
不建议使用,因为有时候会产生意外结果
SELECT
o.order_id,
c.first_name
FROM orders o
NATURAL JOIN customers c
3.12 Cross Joins
对于cross join连接来说,其实使用的就是笛卡尔连接;
如果不带WHERE条件子句,它将会返回被连接的两个表的笛卡尔积,返回结果的行数等于两个表行数的乘积;
Cross Joins是不需要ON和USING这种关键字的。
显式语法
SELECT
c.first_name AS customer,
p.name AS product
FROM customers c
CROSS JOIN products p
ORDER BY c.first_name
Example
有一个型号表,比如小/中/大,还有一个颜色表,比如红/蓝/绿,想要将所有的型号和所有的颜色组合,此时就可以用交叉连接
隐式语法
SELECT
c.first_name AS customer,
p.name AS product
FROM customers c, product p
ORDER BY c.first_name;
EXERCISE
Do a cross join between shippers and products
explicit syntax
SELECT *
FROM customers c
CROSS JOIN products p
ORDER BY c.first_name
implicit syntax
SELECT *
FROM customers c, products p
ORDER BY c.first_name;
3.13 Unions
将两个不同的查询语句进行联结
SELECT
order_id,
order_date,
'Activity' AS status
FROM orders
WHERE order_date >= '2019-01-01'
UNION
SELECT
order_id,
order_date,
'Archived' AS status
FROM orders
WHERE order_date < '2019-01-01';
SELECT first_name
FROM customers
UNION
SELECT name
FROM shippers;
Error Code: 1222. The used SELECT statements have a different number of columns
SELECT first_name,last_name
FROM customers
UNION
SELECT name
FROM shippers;
EXERCISE
将积分为2k以下的顾客挑出来为Bronze, >2K的顾客挑出来成为Silver
SELECT
customer_id,
first_name,
points,
'Bronze' AS type
FROM customers
WHERE points < 2000
UNION
SELECT
customer_id,
first_name,
points,
'Silver' AS type
FROM customers
WHERE points > 2000;
4.1 COLUMN ATTRIBUTION
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YBOhSDUQ-1610167958986)(C:\Users\Changnie\AppData\Roaming\Typora\typora-user-images\image-20210103220415085.png)]
VARCHAR(50)–可变字符缩写,最多可以放50个字符variable不会自动填充
CHAR(50)–当输入字符不够50时,MySQL会再插入45个空格符填满
PK–Primary key 这列的值唯一标识了每一位顾客
NN–Not Null
AI–自动递增,通常被用在主键列;每次我们在表中插入一条新记录,我们让MySQL或者数据库引擎在列中插入一个值,其实就是在最后一行加入一个顾客id,就会在新纪录的同时将顾客id增加1
4.2 Inserting a Single Row
INSERT INTO customers (
first_name,
last_name,
birth_date,
address,
city,
state)
VALUES (
'John',
'Smith',
NULL,
'ADDRESS',
'city',
'CA')
我们用默认关键字,让MySQL生成一个顾客id的唯一值
在SQL里字符串和日期值都要带上引号
可以以任何的顺序去排列,只要两个()中的值对应
4.2 Inserting Multiple Rows
INSERT INTO shippers(name)
VALUES ('Shipper1'),
('Shipper2'),
('Shipper3')
Output
EXERCISE
–Insert three rows in the products table
INSERT INTO products(
name,
quantity_in_stock,
unit_price
)
VALUES(
'zy',
18,
27
)
4.4 Inserting Hierarchical Rows
如何往多张表插入数据
一个订单可以有多个产品,orders表是描述有什么订单,order_items表是描述订单有哪些产品
怎么插入一笔订单以及它对应的所有项目
INSERT INTO orders(customer_id,order_date,status)
VALUES (1,'2019-01-02',1)
5.1 AGGREGATE FUNCTIONS
MySQL中内置的一些函数:
MAX()
MIN()
AVG()
SUM()
COUNT()
SELECT
MAX(invoice_total) AS highest,
MIN(invoice_total) AS lowest,
AVG(invoice_total) AS average,
SUM(invoice_total) AS total,
COUNT(invoice_total) AS number_of_invoices,
COUNT(payment_date) AS count_of_payments,
-- COUNT()只会返回非空payment_dates的记录数
COUNT(*) AS total_records
-- 得到表格中所有记录条目,不管是不是空值
FROM invoices;
排除重复的记录,需要在函数前面+DISTINCT
SELECT
MAX(invoice_total) AS highest,
MIN(invoice_total) AS lowest,
AVG(invoice_total) AS average,
SUM(invoice_total) AS total,
COUNT(invoice_total) AS number_of_invoices,
COUNT(client_id) AS total_number_clients,
COUNT(DISTINCT client_id) AS number_of_client
FROM invoices;
EXERCISE
SELECT
'First half of 2019' AS date_range,
SUM(invoice_total) AS total_sales,
SUM(payment_total) AS total_payments,
SUM(invoice_total - payment_total) AS what_we_expect
-- ()里面的必须是表格中有的,不能是自己起的
FROM invoices i
WHERE invoice_date BETWEEN '2019-01-01' AND '2019-06-30'
UNION
SELECT
'First half of 2019' AS date_range,
SUM(invoice_total) AS total_sales,
SUM(payment_total) AS total_payments,
SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices i
WHERE invoice_date BETWEEN '2019-07-01' AND '2019-12-31'
UNION
SELECT
'First half of 2019' AS date_range,
SUM(invoice_total) AS total_sales,
SUM(payment_total) AS total_payments,
SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices i
WHERE invoice_date BETWEEN '2019-01-01' AND '2019-12-31'
5.2 THE GROUP BY CLAUSE
对一列或者多列进行数据分组
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
Output
默认状态下,数据是按照group by子句中指定的列排序的
使用order by进行排列
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
ORDER BY total_sales ASC
使用where增加筛选
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
WHERE invoice_date > '2019-07-01'
GROUP BY client_id
ORDER BY total_sales ASC;
group by子句永远在from和where子句之后,在order by之前
补充:SQL语句查询顺序
(8) SELECT (9)DISTINCT<Select_list>
(1) FROM <left_table> (3) <join_type>JOIN<right_table>
(2) ON<join_condition>
(4) WHERE<where_condition>
(5) GROUP BY<group_by_list>
(6) WITH {CUBE|ROLLUP}
(7) HAVING<having_condtion>
(10) ORDER BY<order_by_list>
(11) LIMIT<limit_number>
我们可以看出,SELECT子句是必选的,其它子句如WHERE子句、GROUP BY子句等是可选的。
一个SELECT语句中,子句的顺序是固定的。必须严格按照上述的顺序书写。
所有的查询语句都是从FROM开始执行的,在执行过程中,每个步骤都会为下一个步骤生成一个虚拟表,这个虚拟表将作为下一个执行步骤的输入。
- 先对FROM子句中的两个表执行一个笛卡尔乘,此时生成虚拟表 virtual table 1(选择相对小的表做基础表)
- 然后是应用ON条件筛选器,将ON中的逻辑表达式将应用到 virtual table 1中的各个行,筛选出满足ON中的逻辑表达式的行,生成虚拟表 virtual table 2
- 根据连接方式进行进一步的操作。如果是OUTER JOIN,那么这一步就将添加外部行
- LEFT OUTER JOIN就把左表在第二步中筛选掉的行添加进来
- RIGHT OUTER JOIN就将右表在第二步中筛选掉的行添加进来 这样生成虚拟表 virtual table 3
如果 FROM子句中的表数目大于2,那么就将virtual table 3和第三个表连接从而计算笛卡尔积,生成虚拟表,该过程就是一个重复1-3的步骤,最终得到一个新的虚拟表virtual table 3
- 应用WHERE筛选器,对上一步生产的virtual table 3用WHERE筛选器筛选,生成虚拟表virtual table 4
在这有个比较重要的细节需要提一下,如果我们有一个condition需要去筛选,应该在在ON条件筛选器还是用WHERE筛选器指定condition逻辑表达式呢?
ON和WHERE的最大区别在于,如果在ON应用逻辑表达式那么在第三步OUTER JOIN中还可以把移除的行再次添加回来,而WHERE的移除的不可挽回的 - GROUP BY子句将具有相同属性的row组合成为一组,得到虚拟表virtual table 5
如果应用了GROUP BY,那么后面的所有步骤都只能得到的virtual table 5的列或者是聚合函数,并且分组的时候是将列中唯一的值分成一组,同时只为每一组返回一行记录,这一点请牢记。 - 应用CUBE或者ROLLUP选项,为virtual table 5生成超组,生成virtual table 6. 这个暂时还没了解,先放到这里吧。
- 应用HAVING筛选器,生成virtual table 7;HAVING筛选器是唯一一个用来筛选组的筛选器
- 处理SELECT子句。将virtual table 7中的并且在Select_list中的列筛选出来,生成virtual table 8
- 应用DISTINCT子句,virtual table 8中移除相同的行,生成virtual table 9
事实上如果应用了GROUP BY子句,那么DISTINCT是多余的,原因同样在于,分组的时候是将列中唯一的值分成一组,同时只为每一组返回一行记录,那么所以的记录都将是不相同的。 - 应用ORDER BY子句。按照order_by_condition排序virtual table 10,此时返回的一个游标,而不是虚拟表。SQL是基于集合的,集合不会预先对行进行排序,它只是成员的逻辑集合,成员的顺序是无关紧要的。对表进行排序的查询可以返回一个对象,这个对象包含特定的物理顺序的逻辑组织。这个对象就叫游标。正因为返回值是游标,那么使用ORDER BY子句查询不能应用于表达式。
分组数据以查看每个state和city的总销量
SELECT
state,
city,
SUM(invoice_total) AS total_sales
FROM invoices i
JOIN clients s USING(client_id)
GROUP BY state,city;
Output
EXERCISE
SELECT
date,
pm.name,
SUM(amount) AS total_payments
FROM payments p
LEFT JOIN payment_methods pm
ON p.payment_method = pm.payment_method_id
GROUP BY date,payment_method
ORDER BY date ASC
Output
5.3 THE HAVING CLAUSE
HAVING–对分组后的数据进行筛选
–为什么不用WHERE
–因为WHERE必须在group by之前,但是此时还没有对数据进行分组
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
Output
Error Code: 1054. Unknown column ‘invoice_sales’ in ‘where clause’ 0.016 sec
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
WHERE invoice_sales > 500
GROUP BY client_id
5.4 THE ROLLUP OPERATOR
rollup运算符只能应用于聚合值的列 (仅在MySQL中存在)
SELECT
state,
city,
SUM(invoice_total) AS total_sales
FROM invoices i
JOIN clients c USING (client_id)
GROUP BY state,city WITH ROLLUP
Output
ROLLUP运算符对每一组的总计做了计算,当进行分列分组并运用rollup运算符,会看到每个组及整个结果集的汇总值
EXERCISE
SELECT
pm.name,
SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
ON p.payment_method = pm.payment_method_id
GROUP BY payment_method WITH ROLLUP
ERROR: 使用rollup运算符的时候,不能在group by子句中使用列别名,必须输入列的实际名称
SELECT
pm.name,
SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
ON p.payment_method = pm.payment_method_id
GROUP BY pm.name WITH ROLLUP
Output
6.1 WRITING COMPLEX QUERIES
子查询是在另一个SQL语句中的选择语句
重置数据库删除之前添加的记录
6.2 SUBQUERIES
– Find products that are more
– expensive than Lettuce (id = 3)
SELECT *
FROM products
WHERE unit_price > (
SELECT unit_price
FROM products
WHERE product_id = 3
)
获取id=3的价格
WHERE unit_price > (
SELECT unit_price
FROM products
WHERE product_id = 3
)
也可以在FROM子句或者SELECT子句中编写子查询
EXERCISE
–在sql_hr数据库中,找到收入高于平均收入的演员
SELECT *
FROM employees
WHERE salary > (
SELECT
avg(salary)
FROM employees
)
6.3 THE IN OPERATOR
在store数据库中查找没有被订购过的产品
SELECT *
FROM products
WHERE product_id NOT IN(
SELECT DISTINCT product_id
FROM order_items
)
EXERCISE
–Find clients without invoices
SELECT *
FROM clients
WHERE client_id NOT IN (
SELECT DISTINCT client_id
FROM invoices
)
6.4 SUBQUERIES VS JOINS
–Find clients without invoices
1.Subqueries
SELECT *
FROM clients
WHERE client_id NOT IN (
SELECT DISTINCT client_id
FROM invoices
)
2.Joins
SELECT *
FROM clients
LEFT JOIN invoices USING (client_id)
WHERE invoice_id IS NULL
EXERCISE
– Find customers who have ordered lettuce (id = 3)
– Select customer_id, first_name, last_name
Output
1.Subqueries
SELECT
customer_id,
first_name,
last_name
FROM customers
WHERE customer_id IN (
SELECT o.customer_id
FROM order_items oi
JOIN orders o USING (order_id)
WHERE product_id = 3
)
2.Joins
SELECT
DISTINCT customer_id,
first_name,
last_name
FROM customers
LEFT JOIN orders USING (customer_id)
LEFT JOIN order_items USING (order_id)
WHERE product_id =3
6.5 THE ALL KEYWORD
The All Keyword
SELECT *
FROM invoices
WHERE invoice_total > ALL(150,130,167,...)
如果发票总额>ALL括号里的所有值,那么那行的值才会被返回;如果不添加ALL,MySQL无法比对多个值
– Select invoices larger than all invoices of
– Client 3
SELECT *
FROM invoices
WHERE invoice_total> (
SELECT MAX(invoice_total)
FROM invoices
WHERE client_id = 3
)
6.6 THE ANY KEYWORD
使用这些关键字可以得到发票总额高于这段子查询返回的任意值的行
– Select clients with at least two invoices
SELECT *
FROM clients
WHERE client_id IN(
SELECT client_id
FROM invoices
GROUP BY client_id
HAVING COUNT(*)>=2
)
–Use ANY keyword
SELECT *
FROM clients
WHERE client_id = ANY(
SELECT client_id
FROM invoices
GROUP BY client_id
HAVING COUNT(*)>=2
)
如果客户id等于这段查询返回值里的任何一个,那位客户就会被返回到最终结果
6.7 CORRELATED SUBQUERIES
– Select employees whose salary is above the average in their office
Solutions:
– for each employee
– calculate the avg salary for employee.office
– return the employee if salary > avg
SELECT *
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE office_id = e.office_id
)
内外查询都用到了employees,为了获取在同一部门的员工,给office_id列加了一个表名的前缀
EXERCISE
–GET invoices that are larger than the client’s average invoice amount
SELECT invoice_id
FROM invoices i
WHERE invoice_total > (
SELECT AVG(invoice_total)
FROM invoices
WHERE client_id = i.client_id
)
6.8 THE EXISTS OPERATOR
– Select clients that have an invoice
使用exists运算符来查看发票里是否存在符合这个条件的行
SELECT *
FROM clients c
WHERE EXISTS (
SELECT client_id
FROM invoices
WHERE client_id = c.client_id
)
当我们使用EXISTS运算符,子查询并没有给外查询返回一个结果,它会返回一个指令说明这个子查询中是否有符合这个搜索条件的行;只要它找到这个表中有一条匹配这个条件的记录,它就会返回TRUE给EXISTS运算符,然后这个EXISTS运算符就会在最终结果里添加当前记录也就是当前客户
EXERCISE
–Find product which has never been purchased
SELECT *
FROM products p
WHERE NOT EXISTS (
SELECT product_id
FROM order_items
WHERE product_id = p.product_id
)
6.9 SUBQUERIES IN THE SELECTIONN CLAUSE
SELECT
invoice_id,
invoice_total,
(SELECT AVG(invoice_total)
FROM invoices) AS invoice_average,
invoice_total - (SELECT invoice_average) AS difference
FROM invoices
如果只用AVG(invoice_total)则只会显示一行
EXERCISE
SELECT
c.client_id,
name,
(SELECT SUM(invoice_total)
FROM invoices
WHERE client_id = c.client_id) AS total_sales,
(SELECT AVG(invoice_total) FROM invoices) AS average,
(SELECT total_sales - average) AS difference
FROM clients c
6.10 Subqueries in the FROM Clause
将上节课中EXERCISE中的表格当成是新表格
FROM (
SELECT
c.client_id,
name,
(SELECT SUM(invoice_total)
FROM invoices
WHERE client_id = c.client_id) AS total_sales,
(SELECT AVG(invoice_total) FROM invoices) AS average,
(SELECT total_sales - average) AS difference
FROM clients c
)AS total_summary
WHERE total_sales is NOT NULL
使用FROM语句可以让查询变得更加复杂,我们可以将其作为视图存储在数据库中,然后我们可以把这个视图取名为销售信息汇总,它会大大简化我们的查询
7.1 NUMERIC FUNCTIONS
ROUND(3.145,1) --保留1位小数
TRUNCATE(5.7345,4) --用来截断数字
CEILING(5.2) --返回大于或等于这个数字的最小整数6
FLOOR(5.2) --返回小于或等于这个数字的最大整数5
ABS(-5.5) --用来计算绝对值
RAND() --用来生成介于0-1之间的随机值
further information: search on web
mysql numeric functions
7.2 String Functions
SELECT LENGTH('sky'),
UPPER('sky'),
LOWER('SKY'),
LTRIM(' sky'), --移除字符串左侧的空白字符或其他预定义字符
RTRIM('sky '), --移除字符串右侧的空白字符或其他预定义字符
TRIM(' sky ') --它会删除所有前导或者尾随空格
LEFT('Kingdergarten', 4), --提取出前4个字符
RIGHT('Kingdergarten', 6), --提取出后6个字符
substring('Kingdergarten',3,5), --从第3个位置往后取5个字符ngder
LOCATE('n','Kingdergarten'), --查找kindergarten里n的位置/searching不区分大小写
--如果查找不存在的字符,返回值为0(与其他编程语言返回值为-1不同)
REPLACE('Kingdergarten','garten','garden'), --kindergarden
CONCAT('first','last') --合并两个字符firstlast
further information: search on web
mysql string functions
7.3 Date Functions
SELECT NOW(),CURDATE(),CURTIME()
SELECT YEAR(NOW()),MONTH(NOW()),DAY(NOW()),HOUR(NOW()),minute(NOW())
SELECT MONTHNAME(NOW()),DAYNAME(NOW())
SELECT EXTRACT(DAY FROM NOW())
EXERCISE
获取当前年份的订单
SELECT *
FROM orders
WHERE YEAR(order_date) = YEAR(NOW())
7.4 Formatting Dates and Times
SELECT DATE_FORMAT(NOW(),'%M %D %Y'),
DATE_FORMAT(NOW(),'%m %d %y'),
TIME_FORMAT(NOW(), '%H:%I %PM')
Further information: mysql date format string
7.5 Calculating Dates and Times
在日期基础上增加一天或者一个小时或者计算两个日期的间隔
SELECT DATE_ADD(NOW(),INTERVAL 1 DAY),
DATE_ADD(NOW(),INTERVAL -1 DAY),
DATE_ADD(NOW(),INTERVAL 1 YEAR)
计算两个日期的间隔DATEDIFF
SELECT DATEDIFF('2019-01-05','2019-01-01')
注意:这个函数只返回天数的间隔,而不是小时或分钟
返回从零点计算的秒数TIME_TO_SEC
SELECT TIME_TO_SEC('09:00'),
TIME_TO_SEC('09:00') - TIME_TO_SEC('09:02')
7.6 The IFNULL and COALESCE Functions
IFNULL()
SELECT
order_id,
IFNULL(shipper_id,'Not assigned') AS shipper
FROM orders
如果shipper_id为null则返回‘Not assigned’后面的字符
COALESCE()
SELECT
order_id,
COALESCE(shipper_id,comments,'Not assigned') AS shipper
FROM orders
DIFFERENCE
IFNULL函数里可以用其他内容替换空值;而coalesce函数里我们提供一堆值,这个函数会返回这堆值中的第一个非空值
EXERCISE
将first_name 和 last_name 进行拼接
SELECT
CONCAT(first_name,' ',last_name) AS customer,
IFNULL(phone,'Unknown')
FROM customers
7.7 The IF Function
有时会需要测试条件,并根据条件的成立与否返回不同值
IF(expression,first,second)
如果这个表达式被判定为真,这个函数会返回first,否则就会返回second;这可以是任何值,可以是字符串,可以是空值,可以是数字,可以是日期
SELECT
order_id,
order_date,
IF(
YEAR(order_date) = YEAR(NOW()),
'Active',
'Archived'
) AS status
FROM orders
EXERCISE
SELECT
product_id,
name,
COUNT(*) AS orders,
IF(
COUNT(*)>1,
'Many times',
'Once'
) AS 'frequency'
FROM products
JOIN order_items USING (product_id)
GROUP BY product_id
7.8 The CASE Operator
在IF()函数中,只有两种情况,但是在CASE()函数中,可以有无数种情况
SELECT
order_id,
order_date,
CASE
WHEN YEAR(order_date) = YEAR(NOW()) THEN 'Active'
WHEN YEAR(order_date) = YEAR(NOW()) -1 THEN 'Last Year'
WHEN YEAR(order_date) < YEAR(NOW()) -1 THEN 'Archived'
ELSE 'Future'
END AS category
FROM orders
EXERCISE
根据customer的分数对他们进行等级划分
SELECT
CONCAT(first_name,' ',last_name),
points,
CASE
WHEN points > 3000 THEN 'Gold'
WHEN points>2000& points <3000 THEN 'Silver'
WHEN points <2000 THEN 'Bronze'
END AS 'category'
FROM customers
der_id,
IFNULL(shipper_id,‘Not assigned’) AS shipper
FROM orders
如果shipper_id为null则返回‘Not assigned’后面的字符
**COALESCE()**
```sql
SELECT
order_id,
COALESCE(shipper_id,comments,'Not assigned') AS shipper
FROM orders
DIFFERENCE
IFNULL函数里可以用其他内容替换空值;而coalesce函数里我们提供一堆值,这个函数会返回这堆值中的第一个非空值
EXERCISE
将first_name 和 last_name 进行拼接
SELECT
CONCAT(first_name,' ',last_name) AS customer,
IFNULL(phone,'Unknown')
FROM customers
7.7 The IF Function
有时会需要测试条件,并根据条件的成立与否返回不同值
IF(expression,first,second)
如果这个表达式被判定为真,这个函数会返回first,否则就会返回second;这可以是任何值,可以是字符串,可以是空值,可以是数字,可以是日期
SELECT
order_id,
order_date,
IF(
YEAR(order_date) = YEAR(NOW()),
'Active',
'Archived'
) AS status
FROM orders
EXERCISE
SELECT
product_id,
name,
COUNT(*) AS orders,
IF(
COUNT(*)>1,
'Many times',
'Once'
) AS 'frequency'
FROM products
JOIN order_items USING (product_id)
GROUP BY product_id
7.8 The CASE Operator
在IF()函数中,只有两种情况,但是在CASE()函数中,可以有无数种情况
SELECT
order_id,
order_date,
CASE
WHEN YEAR(order_date) = YEAR(NOW()) THEN 'Active'
WHEN YEAR(order_date) = YEAR(NOW()) -1 THEN 'Last Year'
WHEN YEAR(order_date) < YEAR(NOW()) -1 THEN 'Archived'
ELSE 'Future'
END AS category
FROM orders
EXERCISE
根据customer的分数对他们进行等级划分
SELECT
CONCAT(first_name,' ',last_name),
points,
CASE
WHEN points > 3000 THEN 'Gold'
WHEN points>2000& points <3000 THEN 'Silver'
WHEN points <2000 THEN 'Bronze'
END AS 'category'
FROM customers