07-Hive--高级部分2

YuPangZa

已于 2023-09-07 17:31:41 修改

阅读量131

点赞数

分类专栏：大数据文章标签： hive 学习 hadoop

于 2023-08-30 11:44:24 首次发布

本文链接：https://blog.csdn.net/qq_43819048/article/details/132576751

版权

大数据专栏收录该内容

30 篇文章 0 订阅

订阅专栏

一、继续讲解Hive的函数

1、窗口函数Over

来一个需求：求每个部门的员工信息以及部门的平均工资。在mysql中如何实现呢。

-- 第一种写法
SELECT emp.*, avg_sal
FROM emp
	JOIN (
		SELECT deptno
			, round(AVG(ifnull(sal, 0))) AS avg_sal
		FROM emp
		GROUP BY deptno
	) t
	ON emp.deptno = t.deptno
ORDER BY deptno;

-- 第二种写法
select A.*,(select avg(ifnull(sal,0)) from emp B where B.deptno = A.deptno ) from emp A;

看见这种既要明细信息，也要聚合信息的题目，直接开窗！

查询明细和统计，本来是两个矛盾的操作，在添加了窗口函数后，两个操作就可以同步进行，窗口函数中不加入参数时没统计的是整个数据集，在添加参数后，统计的是根据你的参数指定的一个或多个分类字段进行的统计。

上案例：

数据order.txt

姓名,购买日期,购买数量
saml,2018-01-01,10
saml,2018-01-08,55
tony,2018-01-07,50
saml,2018-01-05,46
tony,2018-01-04,29
tony,2018-01-02,15
saml,2018-02-03,23
mart,2018-04-13,94
saml,2018-04-06,42
mart,2018-04-11,75
mart,2018-04-09,68
mart,2018-04-08,62
neil,2018-05-10,12
neil,2018-06-12,80

1. 创建order表:
create table if not exists t_order
(
    name      string,
    orderdate string,
    cost      int
)  row format delimited fields terminated by ',';
-2. 加载数据:
load data local inpath "/home/hivedata/order.txt" into table t_order;

指标一：需求：查询每个订单的信息，以及订单的总数

不使用开窗函数的写法
select *,(select count(1) from t_order) as `订单总数` from t_order ;

使用开窗函数的写法：
select *, count(*) over() from t_order;

开窗函数一般不单独使用，而是跟另外一些函数一起使用，比如 count, over() 这个的窗口是多大呢？over() 是整个数据集。

窗口其实就是范围，比如统计男女比例？必须知道窗口，是统计整个班级还是统计整个学校，班级和学校就是窗口。

窗口函数是针对每一行数据的.
如果over中没有指定参数,默认窗口大小为全部结果集

指标二：查询在2018年1月份购买过的顾客购买明细及总人数。

select *,count(*) over()
from t_order
where substr(orderdate,1,7) = '2018-01';

指标三：查询在2018年1月份购买过的顾客购买明细及总人数。

select *,count(distinct name) over()
from t_order
where substr(orderdate,1,7) = '2018-01';

还有没有其他的写法：
错误的写法：group by 语句，select 后面只能跟分组字段和聚合函数
select *,count(distinct name) over()
from t_order
where substr(orderdate,1,7) = '2018-01' group by name;

distribute by子句:

在over窗口中进行分组,对某一字段进行分组统计,窗口大小就是同一个组的所有记录


语法：
over(distribute by colname[,colname.....])

指标四：查看顾客的购买明细及月购买总额

错误写法：明细信息是不能跟聚合函数一起使用的，聚合多个信息变一个。
select *,sum(cost) from t_order ;
可以这么写：
select *,(select sum(cost) from t_order) from t_order ;

如果非要按照第一个写法，需要开窗
select *,sum(cost) over() from t_order ;
底层原理：先查询到第一条数据，saml,2018-01-01,10，然后进行 sum统计，统计的窗口是整个数据集。
接着查询第二条数据 xxxxx,然后进行 sum统计，统计的窗口是整个数据集,依次类推。

saml    2018-01-01      10      661
saml    2018-01-08      55      661
tony    2018-01-07      50      661
saml    2018-01-05      46      661
tony    2018-01-04      29      661
tony    2018-01-02      15      661
saml    2018-02-03      23      661
mart    2018-04-13      94      661
saml    2018-04-06      42      661
mart    2018-04-11      75      661
mart    2018-04-09      68      661
mart    2018-04-08      62      661
neil    2018-05-10      12      661
neil    2018-06-12      80      661

接着继续编写咱们的需求：
select *,sum(cost) over(distribute by substr(orderdate,1,7) ) from t_order ;

t_order.name    t_order.orderdate       t_order.cost    sum_window_0
saml    2018-01-01      10      205
saml    2018-01-08      55      205
tony    2018-01-07      50      205
saml    2018-01-05      46      205
tony    2018-01-04      29      205
tony    2018-01-02      15      205
saml    2018-02-03      23      23
mart    2018-04-13      94      341
saml    2018-04-06      42      341
mart    2018-04-11      75      341
mart    2018-04-09      68      341
mart    2018-04-08      62      341
neil    2018-05-10      12      12
neil    2018-06-12      80      80
Time taken: 2.128 seconds, Fetched: 14 row(s)

上面over函数中也可以使用over(distribute by name,month(orderdate) )这种形式，使用month函数，但是要注意：使用month函数后会出现不同年的同月份会被统计在一起，因为month函数返回的只是月份，没有年份。

指标5：需求:查看顾客的购买明细及每个顾客的月购买总额

select *,sum(cost) over(distribute by name,month(orderdate) ) from t_order ;

mart    2018-04-13      94      299
mart    2018-04-08      62      299
mart    2018-04-09      68      299
mart    2018-04-11      75      299
neil    2018-05-10      12      12
neil    2018-06-12      80      80
saml    2018-01-01      10      111
saml    2018-01-05      46      111
saml    2018-01-08      55      111
saml    2018-02-03      23      23
saml    2018-04-06      42      42
tony    2018-01-04      29      94
tony    2018-01-07      50      94
tony    2018-01-02      15      94

sort by子句

sort by子句会让输入的数据强制排序（强调：当使用排序时，窗口会在组内逐行变大）

语法：  over([distribute by colname] [sort by colname [desc|asc]])

需求6：查看顾客的购买明细及每个顾客的月购买总额,并且按照日期降序排序

select *,sum(cost) over(distribute by name,month(orderdate) sort by orderdate desc ) from t_order ;

使用sort by排序后，会根据指定的字段进行排序，且统计的结果，会在组内逐行增大，大白话就是，之前统计的结果直接显示最终结果，现在是会有一个逐渐增加的过程，在结果中会体现出来。

注意：可以使用partition by + order by 组合来代替distribute by+sort by组合

select *,sum(cost) over(partition by name,month(orderdate) order by orderdate desc ) from t_order ;

注意：也可以在窗口函数中，只写排序，窗口大小是全表记录

select *,sum(cost) over(order by orderdate desc ) from t_order ;

window 子句

如果要对窗口的结果做更细粒度的划分,那么就使用window子句,常见的有下面几个
PRECEDING：往前 
FOLLOWING：往后 
CURRENT ROW：当前行 
UNBOUNDED：起点，
UNBOUNDED PRECEDING：表示从前面的起点， 
UNBOUNDED FOLLOWING：表示到后面的终点

解析这句话：
select name,orderdate,cost,
       sum(cost) over() as sample1, -- 所有行相加
       sum(cost) over(partition by name) as sample2,-- 按name分组，组内数据相加
       sum(cost) over(partition by name order by orderdate) as sample3,-- 按name分组，组内数据累加
       sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )  as sample4 ,-- 与sample3一样，由起点到当前行的聚合
       sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING   and current row) as sample5, -- 当前行和前面一行做聚合
       sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING   AND 1 FOLLOWING  ) as sample6,-- 当前行和前边一行及后面一行
       sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 -- 当前行及后面所有行
from t_order;

运行结果：

name    orderdate       cost    sample1 sample2 sample3 sample4 sample5 sample6 sample7
mart    2018-04-08      62      661     299     62      62      62      130     299
mart    2018-04-09      68      661     299     130     130     130     205     237
mart    2018-04-11      75      661     299     205     205     143     237     169
mart    2018-04-13      94      661     299     299     299     169     169     94
.....

需求7：查看顾客到目前为止的购买总额

select *,sum(cost) over(rows between UNBOUNDED PRECEDING and current row)  
   from t_order;

需求8：求每个顾客最近三次的消费总额

select *,sum(cost) over(partition by name order by orderdate   rows between 2 PRECEDING and current row)  from t_order;

t_order.name    t_order.orderdate       t_order.cost    sum_window_0
mart    2018-04-08      62      62
mart    2018-04-09      68      130
mart    2018-04-11      75      205
mart    2018-04-13      94      237
neil    2018-05-10      12      12
neil    2018-06-12      80      92
saml    2018-01-01      10      10
saml    2018-01-05      46      56
saml    2018-01-08      55      111
saml    2018-02-03      23      124
saml    2018-04-06      42      120
tony    2018-01-02      15      15
tony    2018-01-04      29      44
tony    2018-01-07      50      94

可有看出。上述结果是将顾客的所有信息都显示出来，每个计算当前行和前两行的数据。

思考题：假如把这个题目理解为，到当前为止，每一个顾客最近三次的消费金额？

1）：计算每个顾客最近三次的消费金额，每个顾客只显示一行结果，使用到了row_number函数，这个函数可以为我们的每个行添加一个编号，然后我们可以根据编号去筛选出我们想要的数据，只统计筛选出来的数据。

ps：因为我们是按照降序排序的，那么编好编号后，编号小于等于3的订单信息就是顾客最近的三次消费记录。

------还存在一个问题：如果有的顾客只有2条数据呢？在这个代码下，只有两条数据的顾客会只计算两条，如果我想将只有两条数据的顾客也给筛选掉，又该怎么做？

SELECT name, SUM(cost) AS total_cost
FROM (
  SELECT name, cost,
         ROW_NUMBER() OVER (PARTITION BY name ORDER BY orderdate DESC) AS row_num
  FROM t_order
) subquery
WHERE row_num <= 3
GROUP BY name;

结果显示：

2）：解决上面的遗留问题

当然可以解决，在进行子查询时，我们可以在多一个字段，统计顾客姓名出现的次数，当出现次数大于等于3时我们才进行查询统计操作

SELECT name, SUM(cost) AS total_cost
FROM (
  SELECT name, cost,
         ROW_NUMBER() OVER (PARTITION BY name ORDER BY orderdate DESC) AS row_num,
         COUNT(*) OVER (PARTITION BY name) AS num_records
  FROM t_order
) subquery
WHERE row_num <= 3 AND num_records >= 3
GROUP BY name;

完美解决

2、序列函数

1）ntile

ntile 是Hive很强大的一个分析函数。可以看成是：它把有序的数据集合平均分配到指定的数量（num）个桶中, 将桶号分配给每一行。如果不能平均分配，则优先分配较小编号的桶，并且各个桶中能放的行数最多相差1

-- SQL语句：
select name,orderdate,cost,
ntile(3) over(partition by name) -- 按照name进行分组,在分组内将数据切成3份
from t_order;

-- 运行结果如下：
mart    2018-04-13      94      1
mart    2018-04-08      62      1
mart    2018-04-09      68      2
mart    2018-04-11      75      3
neil    2018-06-12      80      1
neil    2018-05-10      12      2
saml    2018-02-03      23      1
saml    2018-04-06      42      1
saml    2018-01-05      46      2
saml    2018-01-08      55      2
saml    2018-01-01      10      3
tony    2018-01-02      15      1
tony    2018-01-04      29      2
tony    2018-01-07      50      3
Time taken: 2.192 seconds, Fetched: 14 row(s)

需求：获取一个表中，所有消费记录中，每一个人，最后50%的消费记录。

select name,orderdate,cost,
ntile(2) over(partition by name order by orderdate ) as xuhao
from t_order where xuhao = 2;

错误：where子句后面不能使用别名,当不能使用的时候要么复制一份，要么包一层
select name,orderdate,cost from (
select name,orderdate,cost,
ntile(2) over(partition by name order by orderdate ) as xuhao
from t_order ) t where t.xuhao=2;

2）LAG和LEAD函数

lag返回当前数据行的前第n行的数据

语法：lag(colName,n[,default value]): 取字段的前第n个值。如果为null,显示默认值

lead返回当前数据行的后第n行的数据

需求:查询顾客上次购买的时间

select * ,lag(orderdate,1) over( partition by name order by orderdate ) from t_order;

mart    2018-04-08      62      NULL
mart    2018-04-09      68      2018-04-08
mart    2018-04-11      75      2018-04-09
mart    2018-04-13      94      2018-04-11
neil    2018-05-10      12      NULL
neil    2018-06-12      80      2018-05-10
saml    2018-01-01      10      NULL
saml    2018-01-05      46      2018-01-01
saml    2018-01-08      55      2018-01-05
saml    2018-02-03      23      2018-01-08
saml    2018-04-06      42      2018-02-03
tony    2018-01-02      15      NULL
tony    2018-01-04      29      2018-01-02
tony    2018-01-07      50      2018-01-04

select * ,lag(orderdate,1,'1990-01-01') over( partition by name order by orderdate ) from t_order;

mart    2018-04-08      62      1990-01-01
mart    2018-04-09      68      2018-04-08
mart    2018-04-11      75      2018-04-09
mart    2018-04-13      94      2018-04-11
neil    2018-05-10      12      1990-01-01
neil    2018-06-12      80      2018-05-10
saml    2018-01-01      10      1990-01-01
saml    2018-01-05      46      2018-01-01
saml    2018-01-08      55      2018-01-05
saml    2018-02-03      23      2018-01-08
saml    2018-04-06      42      2018-02-03
tony    2018-01-02      15      1990-01-01
tony    2018-01-04      29      2018-01-02
tony    2018-01-07      50      2018-01-04

需求：求5分钟内点击100次的用户

dt 					id 	url
2019-08-22 19:00:01,1,www.baidu.com
2019-08-22 19:01:01,1,www.baidu.com
2019-08-22 19:02:01,1,www.baidu.com
2019-08-22 19:03:01,1,www.baidu.com

编写一个伪SQL:
select id,dt,lag(dt,100) over(partition by id order by dt) 
from tablename where  dt-lag(dt,100) over(partition by id order by dt)<5分钟

思路：先按照id分组，按照点击时间排序，获取从当前时间算起，前100次以前的时间，让当前时间-100次以前的时间，如果差值大于5分钟，说明该用户的数据是必须查出来的。

3）FIRST_VALUE和LAST_VALUE

first_value 取分组内排序后，截止到当前行，第一个值 
last_value 分组内排序后，截止到当前行，最后一个值

select name,orderdate,cost,
   first_value(orderdate) over(partition by name order by orderdate) as time1,
   last_value(orderdate) over(partition by name order by orderdate) as time2
from t_order;

name    orderdate       cost    time1   time2
mart    2018-04-08      62      2018-04-08      2018-04-08
mart    2018-04-09      68      2018-04-08      2018-04-09
mart    2018-04-11      75      2018-04-08      2018-04-11
mart    2018-04-13      94      2018-04-08      2018-04-13
neil    2018-05-10      12      2018-05-10      2018-05-10
neil    2018-06-12      80      2018-05-10      2018-06-12
saml    2018-01-01      10      2018-01-01      2018-01-01
saml    2018-01-05      46      2018-01-01      2018-01-05
saml    2018-01-08      55      2018-01-01      2018-01-08
saml    2018-02-03      23      2018-01-01      2018-02-03
saml    2018-04-06      42      2018-01-01      2018-04-06
tony    2018-01-02      15      2018-01-02      2018-01-02
tony    2018-01-04      29      2018-01-02      2018-01-04
tony    2018-01-07      50      2018-01-02      2018-01-07
Time taken: 2.053 seconds, Fetched: 14 row(s)

3、排名函数

row_number() rank() dense_rank()

1、row_number()

row_number从1开始，按照顺序，生成分组内记录的序列,row_number()的值不会存在重复,当排序的值相同时,按照表中记录的顺序进行排列

效果如下：
98		1
97		2
97		3
96		4
95		5
95		6

没有并列名次情况，顺序递增

2、rank()

生成数据项在分组中的排名，排名相等会在名次中留下空位

效果如下：
98		1
97		2
97		2
96		4
95		5
95		5
94		7
有并列名次情况，顺序跳跃递增

3、dense_rank()

生成数据项在分组中的排名，排名相等会在名次中不会留下空位

效果如下：
98		1
97		2
97		2
96		3
95		4
95		4
94		5
有并列名次情况，顺序递增

4、案例演示

数据

1 gp1808 80
2 gp1808 92
3 gp1808 84
4 gp1808 86
5 gp1808 88
6 gp1808 70
7 gp1808 98
8 gp1808 84
9 gp1808 86
10 gp1807 90
11 gp1807 92
12 gp1807 84
13 gp1807 86
14 gp1807 88
15 gp1807 80
16 gp1807 92
17 gp1807 84
18 gp1807 86
19 gp1805 80
20 gp1805 92
21 gp1805 94
22 gp1805 86
23 gp1805 88
24 gp1805 80
25 gp1805 92
26 gp1805 94
27 gp1805 86

建表，加载数据

create table if not exists stu_score(
userid int,
classno string,
score int
)
row format delimited 
fields terminated by ' ';

load data local inpath '/home/hivedata/stu_score.txt' overwrite into table stu_score;

需求一：对每个班级的每次考试按照考试成绩倒序

select *,dense_rank() over(partition by classno order by  score desc) from stu_score;

select *,dense_rank() over(order by score desc) `全年级排名`  from stu_score;

需求二：获取每次考试的排名情况

select *,
-- 没有并列，相同名次依顺序排
row_number() over(distribute by classno sort by score desc) rn1,
-- rank()：有并列，相同名次空位
rank() over(distribute by classno sort by score desc) rn2,
-- dense_rank()：有并列，相同名次不空位
dense_rank() over(distribute by classno sort by score desc) rn3
from stu_score;
运行结果：

26      gp1805  94      1       1       1
21      gp1805  94      2       1       1
25      gp1805  92      3       3       2
20      gp1805  92      4       3       2
23      gp1805  88      5       5       3
27      gp1805  86      6       6       4
22      gp1805  86      7       6       4
24      gp1805  80      8       8       5
19      gp1805  80      9       8       5
11      gp1807  92      1       1       1
16      gp1807  92      2       1       1
10      gp1807  90      3       3       2
14      gp1807  88      4       4       3
13      gp1807  86      5       5       4
18      gp1807  86      6       5       4
12      gp1807  84      7       7       5
17      gp1807  84      8       7       5
15      gp1807  80      9       9       6
7       gp1808  98      1       1       1
2       gp1808  92      2       2       2
5       gp1808  88      3       3       3
9       gp1808  86      4       4       4
4       gp1808  86      5       4       4
8       gp1808  84      6       6       5
3       gp1808  84      7       6       5
1       gp1808  80      8       8       6
6       gp1808  70      9       9       7

需求三：求每个班级的前三名

select * from (
select * ,dense_rank() over(partition by classno order by score desc) as paiming from stu_score) t  where  paiming <=3;

4、练习

孙悟空	语文	87
孙悟空	数学	95
孙悟空	英语	68
大海	语文	94
大海	数学	56
大海	英语	84
宋宋	语文	64
宋宋	数学	86
宋宋	英语	84
婷婷	语文	65
婷婷	数学	85
婷婷	英语	78

create table score(
name string,
subject string, 
score int) 
row format delimited fields terminated by "\t";

load data local inpath '/home/hivedata/test_e.txt' into table score;

1、计算每门学科成绩排名

select * ,dense_rank() over(partition by  subject order by  score desc) from score;

2、求出每门学科前三名的学生

select * from (
select * ,dense_rank() over(partition by  subject order by  score desc) paiming from score ) t where paiming < 4;

5、自定义函数

hive的内置函数满足不了所有的业务需求。hive提供很多的模块可以自定义功能，比如：自定义函数、serde、输入输出格式等。而自定义函数可以分为以下三类：
1）UDF: user defined function：用户自定义函数，一对一的输入输出 （最常用的）。比如abs()
2）UDAF: user defined aggregation function：用户自定义聚合函数，多对一的输入输出,比如：count sum max。
3) UDTF: user defined table-generate function ：用户自定义表生产函数 一对多的输入输出，比如：lateral view explode

1、将字母变大写案例

创建Maven项目：MyFunction

在pom.xml,加入以下maven的依赖包

<dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>3.1.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.3.1</version>
    </dependency>

需要继承一个类：继承org.apache.hadoop.hive.ql.udf.generic.GenericUDF，并重写抽象方法。

需求：编写一个自定义函数，让其字母大写变小写

package com.bigdata;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

public class LowerString extends GenericUDF {
    // 初始化操作
    // 假如传递的参数个数不是1个，就抛异常
    @Override
    public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
        if (objectInspectors.length != 1) {
            // 说明参数的数量不对
            throw new UDFArgumentException("参数数量错误");
        }
        // 返回值类型检查
        return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
    }

    // 编写具体代码的地方
    @Override
    public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {

        // 获取到传入进来的参数
        String inputString = deferredObjects[0].get().toString();
        // 逻辑处理
        if (inputString == null || inputString.length() == 0) {
            return "";
        }
        // abc
        return inputString.toUpperCase();
    }

    // 返回自定义函数的描述
    @Override
    public String getDisplayString(String[] strings) {
        return "该函数可以将大写的字母变为小写";
    }
}

编写好之后，打包 package,变为一个jar包。

将该jar包放入 hive的lib文件夹下。

函数的加载方式：

第一种：命令加载 (只针对当前session有效)

1、将MyFunction-1.0-SNAPSHOT.jar 放入/opt/installs/hive/lib/目录下：
2. 将编写好的UDF打包并上传到服务器，将jar包添加到hive的classpath中
	hive> add jar /opt/installs/hive/lib/MyFunction-1.0-SNAPSHOT.jar;
3. 创建一个自定义的临时函数名
	hive> create temporary function myUpper as 'com.bigdata.LowerString';
4. 查看我们创建的自定义函数，
	hive> show functions;
5.在hive中使用函数进行功能测试 
select myUpper('yunhe');
6. 如何删除自定义函数？在删除一个自定义函数的时候一定要确定该函数没有调用
	hive> drop temporary function if exists myupper;

第二种方式：


1. 将编写好的自定函数上传到服务器

2. 写一个配置文件，将添加函数的语句写入配置文件中，hive在启动的时候加载这个配置文件
[root@yunhe01 ~]# vi $HIVE_HOME/conf/hive-init
文件中的内容如下
add jar /opt/installs/hive/lib/MyFunction-1.0-SNAPSHOT.jar;
create temporary function myUpper as 'com.bigdata.LowerString';

3. 启动hive时
[root@yunhe01 ~]# hive -i $HIVE_HOME/conf/hive-init

第三种方式：

在.hiverc 文件中，添加
add jar /opt/installs/hive/lib/MyFunction-1.0-SNAPSHOT.jar;
create temporary function myUpper as 'com.bigdata.LowerString';

每次hive启动即可使用。

二、with .. as.. 的使用

with as 也叫做子查询部分，hive 可以通过with查询来提高查询性能，因为先通过with语法将数据查询到内存，然后后面其它查询可以直接使用。

with t as (
select *, 
row_number() over(partition by id order by salary desc) 
ranking from tmp_learning_mary)
select * from t where ranking = 1;

with as就类似于一个视图或临时表，可以用来存储一部分的sql语句作为别名，不同的是with as 属于一次性的，而且必须要和其他sql一起使用才可以！

其最大的好处就是适当的提高代码可读性，而且如果with子句在后面要多次使用到，这可以大大的简化SQL；更重要的是：一次分析，多次使用，这也是为什么会提高性能的地方，达到了“少读”的目标。使用注意事项：1.with子句必须在引用的select语句之前定义，而且后面必须要跟select查询，否则报错。2.with as后面不能加分号，with关键字在同级中只能使用一次，允许跟多个子句，用逗号隔开，最后一个子句与后面的查询语句之间只能用右括号分隔，不能用逗号。

create table a as
with t1 as (select * from firstTable),
t2 as (select * from secondTable),
t3 as (select * from thirdTable)
select * from t1,t2,t3;

3.前面的with子句定义的查询在后面的with子句中可以使用。但是一个with子句内部不能嵌套with子句。

with t1 as (select * from firstTable),
t2 as (select t1.id from t1)    #第二个子句t2中用了第一个子句的查询t1
select * from t2

YuPangZa

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录