扁担学习MySQL

最新推荐文章于 2024-07-25 11:32:53 发布

扁担来喽

最新推荐文章于 2024-07-25 11:32:53 发布

阅读量1.7k

点赞数 28

文章标签： mysql 数据库

本文链接：https://blog.csdn.net/hxhxhzjz/article/details/135371440

版权

扁担学习MySQL

一段完整的SQL执行的顺序

from(包括from中的子句)

join

where

group by（group by中可能使用 select的别名，其后的语句中都可以使用）

avg/sum等聚合函数

having (可以使用聚合函数可以使用别名)

select

distinct

order by (可以使用别名聚合函数)

limit

常用命令

##### show databases;
##### use 数据库名;
##### show tables;
##### show tables from 数据库名;
##### create table 表名() charset utf8;
##### desc 表名;
##### show create table 表名;
##### show full columns from 表名;

# DML 数据库操作语言
update 表名 set 字段 = 值,... where  条件;# 按照某种过滤方式进行修改值
delete from 表名 where 条件; #按照某种条件 删除应的行数据
alter table 表名 modify 字段名 字段类型 字段约束; #修改字段
alter table 表名 change 旧字段 新字段 字段类型 字段约束; #改变字段名等
alter table 表名 add 字段名 字段类型; # 添加字段
alter table 表名 drop column 字段名; # 删除某列

☆☆☆ insert into 表名 values(字段值); #插入信息

☆☆☆
#一条较为完整的SQL查询语句
select 
	表名1.字段1 as 字段1,
	表名2.字段2 as 字段2,
	...
from 
	表名1 
join
	表名2 
on
	表连接条件
where 
	条件 
group by 
	字段1, # 要注明字段来自那张表 [ 表名.字段名 ] 不能使用select 的别名
	... 
with rollup # 上卷 对group by分组之后的某些关键数据 进行汇总 eg：sum() [ hive 中与之相同效果的是 with cube ]
			# 对 group by 的第一个字段进行分组求和
having 
	条件  # 可以使用聚合函数，可以使用 select 的别名
order by 
	字段1, # 要注明字段来自那张表 [ 表名.字段名 ] 可以使用 select 的别名 可以使用 聚合函数
	...
limit ; # limit (page -1 ) * maxSize , maxSize

从 java 集合的角度去理解
select 相当于 打印
from tableName 相当于对 tableName中的每一条记录去遍历 tableName 相当于 map 集合 
where 条件 相当于 if判断语句 只有当 tableName中的某条记录 在这个条件下 返回的是 true 才将该条记录 留存下来 交给 select打印
case [字段] when then else end 相当于 java 中的 switch
case when [字段]表达式 then else end 相当于 java 中的 if...else if...else

sql脚本的基本写法

drop database if exists 数据库名; #可不写
create database 数据库名; #可不写
use 数据库名;
drop table if exists 表名;
create table 表名(
	name int comment '名字', # comment 后面的注释要用 单引号 <''>
    gender char(2)
)charset utf8; # 设置存储的编码集

desc 表名; # 查看表结构 脚本不写
show create table 表名; # 查看建表语句 脚本不写
show full columns from 表名; # 查看表的列的所有信息

select * from 表名;

字段约束

主键约束

关键字 **primary key**  使用该关键字的字段 称为该表的主键， 主键 自动拥有 非空约束和位于约束
mysql 会将主键单独存储，并将其排序，建立索引，在查询的时候，使用二分查找法

自增长约束

关键字 **auto_increment** 使用该关键字后悔自动 +1增长
通常使用在 主键上

非空约束

关键字 **not null** 使用之后，该字段不能为空

默认约束

关键字 **default** 使用 default 默认值

唯一约束

**unique** 使用该关键字后，该字段的值不能重复

外键约束

字符串相关的函数

group_count()

分组拼接函数

group_count(字段名 order by 排序字段 separator '分隔符') 默认 ','

# 对部门分组，将每个部门的员工号连接在一起
select
	dept_no,
	group_concat(emp_no)
from dept_emp
group by
	dept_no

concat(str1,str2,…)

字符串拼接函数

select concat('hello','world'); # 'helloworld'

concat_ws(separator,str1,str2,…)

用指定字符连接字符串，忽略null值

select concat_ws('-','hello','world'); # 'hello-world'

length(str)

返回字节的长度

select length('hello'); # 5
select length('张三'); # 6 utf8 一个汉字三个字符

char_length()

返回字符串的长度 **汉字 **也是一个长度

select char_length('张三'); # 2

upper(str)

将字符串转化为大写

select upper('hello'); # 'HELLO'

lower(str)

将字符串转换为小写

select lower('HLEEO'); # 'hello'

substring(str,start,length)

返回字符串的子串

select substring('helloworld',6,5);# 'world' (从6开始数5个字符)

substring_index(str,separator,count)

利用separator将str分割，count如果为正，则返回至最后的内容，如果为负，从右开始

select substring_index('a,b,c,d',',',3); # 'a,b,c'
select substring_index('a,b,c,d',',',-2); # 'c,d'

replace(str,search_str,replace_str)

替换字符串中的指定字符

select repalce('helloworld','world','join'); # 'hellojoin'

trim([{both | leading | traling} [remstr] from] str)

去除字符串中的空格或指定字符

select trim(' hello'); # 'hello'
SELECT TRIM('  bar   '); # 'bar'
SELECT TRIM(LEADING 'x' FROM 'xxxbarxxx');  # 'barxxx'
SELECT TRIM(BOTH 'x' FROM 'xxxbarxxx'); # 'bar'
SELECT TRIM(TRAILING 'xyz' FROM 'barxxyz');  # 'barx'

ltrim()

去除左边的空格

select ltrim(' hello '); # 'hello '

rtrim()

去除右边的空格

select rtrim(' hello '); # ' hello'

instr(str,substr)

返回子串在字符串中首次出现的位置

select instr('helloworld','world'); # 6

lpad(str,len,padstr)

左填充函数，str被操作字符串， len ：如果str长度没有达到len 则使用padstr填充到len长度如果str的长度超过了len 则截取str到len长度

select lpad('hello',2,'sss');# "he"
select lpad('hello',6,'sss');# "shello"

rpad(str,len,padstr)

作用与 lpad() 相同，不同的是从右边开始填充

数学函数

round()

# 四舍五入函数

ceiling()

# 向上取整函数

floor()

时间函数

datediff(date1,date2)

计算两个日期的差值

select datediff(now(),'2023-12-03');

date_add(date,interval expr unit)

对日期进行加减操作

# expr 加正数 减负数
# unit day year month
select date_add(now(),interval 1 day); # 当前日期加一天
select date_add(now(),interval -1 day); # 当前日期减一天

date_format(date,格式) 将日期转换成字符

str_to_date 将字符转换成日期

流程控制函数

if()

处理双分支 可以嵌套
eg:
if(num < 10,'缺货','有货') 
判断条件 为 true  返回 '缺货'
判断条件 为 false 返回 '有货'

ifnull()

判断某个字段是否为 null
可以嵌套
ifnull(num,defalut)
判断  若 num 不为 null 则返回 num 本身的值
	 若 num 为 null 则返回 设置的 default

case 语句

处理多分支 可适用于 行列转换
case when 情况1 then else
	 when 情况2 then else
	 ...
	 end
可与聚合等函数一起使用
eg:
count(case when note = 'novel' then 1 else 0) as novel_num

开窗函数

over() 开窗函数，基本语法：分析函数 over(patition by 列名 order by 列名 rows between 开始位置 and 结束位置)

一般情况下，分析函数不可或缺， rows between and 可以忽略 partition by / order by 可以都有或只需要其中一个

partition by 进行分组 order by 组内排序

如何限制窗口的范围

1. partition by 
使用 partition by 对数据进行分组，每个独立的分组将有自己的窗口，仅在同一个分组内的数据参与窗口函数的计算
2. order by
使用 order by 指定窗口内数据的排序方式 窗口函数将根据指定的排序规则对窗口内的数据进行处理
3. rows between 
使用 rows between 指定窗口的边界， 可以指定边界为当前行之前( preceding ) 或 当前行之后 ( following )
或者当前行之前和之后的一定范围内
4. range between 
使用 range between 指定窗口的边界 类似于 rows between 不同之处在于 range 根据值的大小来定义边界，而不是行数

rows between 或 range between 常用的限制边界的
1. unbounded preceding 从当前行到之前的第一行的无限远
2. N preceding 从当前行之前的n行
3. current row 表示当前行
4. N pollowing 从当前行之后的n行
5. unbounded following 从当前行之后一直到最后一行的无限远

eg:
rows between 2 preceding and current row 窗口范围包括当前行及前面两行
range between 2 preceding and current row 窗口范围包括当前行及前面两行 对 order by 中的值的大小来定义
	eg: order by month range between 2 preceding and current row
		若 month = 7 , 则找她之前行月份为 6、5的行 若没有就算了

排名类

rank()

rank() 排序时若存在相同的位次 则会跳过之后的位次 eg： 1,2,2,4,4,6

dense_rank()

dense_rank() 排序时若存在相同的位子 不跳过 eg：1,2,2,3,4,4,5

row_number()

连续的位次不跳过 eg：1,2,3,4,5,6

分布类

percent_rank()

# 等级百分比函数 计算分区内的等级值百分比
# 计算方式 (rank - 1) / (tows - 1)
# rank 的值是使用rank函数产生的序号
# rows 的值为当前窗口(分区)的记录数(行数)
### 在使用时 partition by order by 不能省略
用途：可以用来计算某个系列产品的百分比排名

eg:

有如下表 goods，计算女装/女士精品下的各产品的百分比排名

select
	percent_rank() over(partition by category_id order by price) as pr,
	category_id,
	category,
	name,
	price,
	stock
from goods
where category = '女装/女士精品'

cume_dist()

# 主要用于查询小于或等于某个值的比例 结合 round 让结果更直观
# count / rows
# count 为小于或等于当前值的个数(本身也算一个)
# rows 当前分区的记录数(行数)

eg: 查询goods表中小于或等于当前价格的比例

select
	round(cume_dist() over(partition by category_id order by price),2) as cd,
	category_id,
	category,
	name,
	price,
	stock
from goods

first_value(字段名)

获取当前分区的第一个值
eg: first_value(num) over(partition by id order by month) as min_month
获取当前id第一次登陆的月份 的num

last_value(字段名) 获取当前分区的之后一个值

ntile(分几块的数字)

将数据分桶 
ntile(5) over(partition by order by) as n

where n = 2
取 20% - 40% 之间的数据

前后类

lead(字段,偏移量,超出窗口默认值)

向下统计

# 返回当前行下 偏移量的值
eg lead(date,1) 分区内下一行的date值

eg: 计算goods中每个品类中产品与下一个产品之间的价格差(price 升序)

select
	id, 
        category, 
        name, 
        price,
		after_price,
		price - after_price as sub_price
from(
	select
        id, 
        category, 
        name, 
        price,
        lead(price) over(partition by category order by price) as after_price
    from goods
) as t1

#计算留存率 （前后日期相差一天）
lead(date) over(partition by 字段... order by 字段...)

#方法一 使用lead() 开窗函数
#计算留存率
select
	round(count(l2.user_id)/count(lo.user_id),3) as p 
from(
	-- 全部用户
	select
		user_id
	from login
	group by
		user_id
) as lo
left join (
	-- 第二天依旧登陆的用户
	select
		user_id
	from (
		select 
			user_id,
			datediff(lead(date) over(partition by user_id order by date),date) as ld
		from 
			login
	) as ld1
	where ld1.ld = 1
	group by
		user_id
) as l2
on lo.user_id = l2.user_id;

# 方法二
select
	round(count(new_table.user_id) / (select count(distinct user_id) from login),3) as p
from(
	# 找出在第二天登陆的用户
		select 
			user_id,
			date
		from login
		where (user_id,date) in (
						# 若留存下来新用户后一天登陆的时间
							select
								user_id,
								date_add(min(date),interval 1 day) as date
							from login
							group by 
								user_id
							)
			) as new_table;

lag(字段，偏移量，超出窗口默认值)

向上统计

# 返回当前行上 偏移量的值
eg: lag(date,1) 分区内上一行的date值

聚合函数类

sum() 
count()
min()
max()
avg()
除了count(*|1) 其余都会忽略null
sum() avg() 用于处理数值型数据
max() min() count() 可以用于处理任何类型的数据
eg:sum(distinct 字段) 统计去重后的结果

sum()

# 使用 sum(字段名) over(partition by 分区字段) 会对分区内的字段累加 相当于 group by sum
# 使用 sum(字段名) over(partition by 分区字段 order by 排序字段)
	# 会在分区内 逐行累加  -- 若 排序字段 有一样的值 ，相同值的两行记录 sum 的结果相同

eg: 依次累加某品类各产品的总价格

select 
	id, 
	category, 
	name, 
	price,
	sum(price) over(partition by category order by id) as sum_price
from 
	goods

# 若按照价格排序 当同一分区内有相同价格 则 相同价格 的行的sum值相同
select 
	id, 
	category, 
	name, 
	price,
	sum(price) over(partition by category order by price) as sum_price
from 
	goods

min() max()

# min/max(字段名) over(partition by 分区字段) 相当于 group by min/max
# min/max(字段名) over(partition by 分区字段 order by 排序字段) 
# min/max(字段名) over(order by 排序字段)
	# 返回首行到当前行的min/max

select 
	id, 
	category, 
	name, 
	price,
	max(price) over(partition by category order by id) as max_price
from 
	goods

avg()

# avg(字段名) over(partition by 分区字段) 相当于 group by avg
# avg(字段名) over(partition by 分区字段 order by 排序字段) 
# avg(字段名) over(order by 排序字段)
	# 返回首行到当前行的avg

select 
	id, 
	category, 
	name, 
	price,
	avg(price) over(partition by category order by id) as avg_price
from 
	goods

count()

# count(字段名) over(partition by 分区字段) 相当于 group by count
# count(字段名) over(partition by 分区字段 order by 排序字段)
# count(字段名) over(order by 排序字段)
	# 会在分区内 逐行统计  -- 若 排序字段 有一样的值 ，相同值的两行记录 count 的结果相同

select 
	id, 
	category, 
	name, 
	price,
	count(price) over(partition by category order by price) as count_price
from 
	goods

where子查询需要注意的问题

在进行WHERE子查询时，有几个问题需要注意：

子查询的效率问题：子查询可能会影响查询的性能。确保子查询的逻辑和语法是正确的，并且尽量避免在主查询中多次执行子查询，以减少查询的复杂度。

子查询的返回结果：子查询应该返回一个值或结果集，并确保与主查询的条件匹配。否则，可能会导致语法错误或逻辑错误。

子查询的返回行数：子查询返回的行数可能会影响主查询的结果。如果子查询返回多行，确保在主查询中使用适当的操作符（如IN、ANY、ALL）来处理这些行。

子查询中的相关列：如果在子查询中引用了主查询中的列或表，确保它们在作用域内，并正确地引用它们。使用别名可以帮助区分不同的列。

子查询的位置：根据查询的需要，将子查询放置在适当的位置。子查询可以出现在SELECT、WHERE、FROM子句中，根据业务需求进行调整。

子查询的嵌套：子查询可以嵌套在其他子查询中。确保嵌套查询的逻辑是正确的，并且理解嵌套查询的执行顺序。

子查询的结果集大小：如果子查询返回的结果集较大，可能会导致查询性能下降。考虑对子查询进行优化，例如使用JOIN操作代替子查询。

总之，当使用WHERE子查询时，需要注意语法和逻辑的正确性，确保返回结果与主查询条件匹配，并尽量优化查询性能

三种删除方式

drop

drop table name;
删除表，删除内容、定义，释放空间，删除表的结构被依赖的约束，触发器，索引，
依赖该表的存储过程、函数将被保留，
状态被修改为 invalib 不能回滚

truncate

truncate table name;
删除表数据，清空表结构，释放空间
按照原本的表结构创建一张新的数据表
不能删除行数据，只能清空整个表
不能回滚

delete

delete from table name where:
删除某行数据，该删除操作 作为事务记录 在日志中保存 可以回滚

truncate 和 delete from table name 只删除数据 保留表结构 (truncate 将原先的表删除，新创建了一张新表)
truncate 删除带自增长的列的表的时候，之后在插入数据就会从起始值开始
delete删除带自增长的列的表后，如果在插入数据会从上一次的断点开始

count(1)/count(*)/ count(字段)

count(1) 和 count(*) 不会去除 null
count(字段) 去除 null

where 和 having 的区别

相同点: where 和 having 都可以实现对查询结果进行过滤
where 在 group by 之前对文件进行过滤,作用的是原始表
having 在 group by 之后对文件进行过滤，作用的是 分组之后的表
having 只能使用 group by 中的字段 或者聚合函数 可以使用 select 中的别名

事务

事务的四大特性
原子性:
一致性:
隔离性:
持久性:

索引

创建索引
创建单索引：
	create index 索引名 on 表名(索引字段);
创建组合索引:
	create index 索引名 on 表名(索引字段1,索引字段2...);

删除索引
	drop index 索引名 on 表名;
组合索引：左优先级
	在单使用时，组合索引最左边的字段会 使用索引，其他字段不会使用索引

存储引擎

# SQL语句在MySQL中的执行流程：
	SQL语句 --> 查询缓存 --> 解析器 --> 优化器 --> 执行器 

# 优化器 是在表里面有多个索引的时候，决定使用那个索引，或者 在一个语句有多表关联(join) 的时候，决定各表的连接顺序
#	还有表达式简化、子查询转为连接、外连接转为内连接等
![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/b5bb556bb68645b39b95fe59b417d584.png#pic_center)


##### Innodb

```mysql
# 在 mysql 5.0 版本之后，创建表默认使用 Innodb 引擎
 支持外键 支持事务
 将表存储为两个文件
	.frm : 描述表结构文件、字段长度等
    .ibd 文件(存储数据信息 和 索引信息)
采用的是B+tree

MyISAM

# mysql 5.0版本之前 创建表默认 使用 MyISAM 引擎
 不支持事务
 将表存储为三个文件
 	(5.7) .frm ：描述表结构文件、字段长度等
 	(8.0) .xxx.sdi 描述表结构文件、字段长度
 	
 	.MYD (MYData) 数据信息文件 存储数据信息
 	.MYI (MYIndex) 存放索引信息文件
采用的是B_tree

memory

不会将表存放在 磁盘上，而是将表存放到内存中 效率更快 但 安全性差(断电即丢)

MySQL架构

执行流程
	客户端 向 mysql 服务器发送请求
		SQL --> 查询缓存 -->命中 返回结果
						--> 未命中--> 进入解析器 --> 查询优化器 --> 查询优化器
																   ||
			 返回结果、放入缓存结果<--查询结果<--执行查询引擎<-- 执行计划<--||
由于 查询缓存 的命中率极低
在 mysql 8.0 默认关闭查询缓存

mysql架构图

在这里插入图片描述

视图

# 视图 是依据 真实表为模板创建的，是虚表,本身不具备数据 
# 视图的创建和删除只影响视图本身，不影响对应的基表。但是当对视图中的数据进行增加、删除和修改操作时，数据表中的数据会相应地发生变化，反之亦然

# 创建视图
CREATE VIEW 视图名称
AS 查询语句

不可更新的视图

# mysql要使视图可更新，视图中的行和底层基本表中的行之间必须存在 一对一 的关系。
# 另外当视图定义出现如下情况时，视图不支持更新操作：
	# 在定义视图的时候指定了“ALGORITHM = TEMPTABLE”，视图将不支持INSERT和DELETE操作
	# 视图中不包含基表中所有被定义为非空又未指定默认值的列，视图将不支持INSERT操作
	# 在定义视图的 SELECT 语句中使用了 JOIN联合查询 ，视图将不支持INSERT和DELETE操作
	# 在定义视图的SELECT语句后的字段列表中使用了 数学表达式 或 子查询 ，视图将不支持INSERT，也 不支持UPDATE使用了数学表达式、子查询的字段值
	# 在定义视图的SELECT语句后的字段列表中使用 DISTINCT 、 聚合函数 、 GROUP BY 、 HAVING 、 UNION 等，视图将不支持INSERT、UPDATE、DELETE
	# 在定义视图的SELECT语句中包含了子查询，而子查询中引用了FROM后面的表，视图将不支持 INSERT、UPDATE、DELETE
	# 视图定义基于一个 不可更新视图 
	# 常量视图

[注] 虽说可以更新视图，但 视图作为 虚拟表 ，只要是为了方便查询 ，不建议更新视图的数据

对视图数据的更改，都是通过对 真实表中的数据进行操作 来完成的

修改视图

ALTER VIEW 视图名称
AS
查询语句

删除视图

# 删除的只是视图的定义，并不会删除 真实表中的数据
DROP VIEW IF EXISTS 视图名称1,视图名称2,视图名称3,...;

视图的优点

1. 操作简单
	将经常使用的查询操作定义为视图，可以使开发人员不需要关心表结构，表关联，字需要简单操作视图即可
2. 减少数据冗余
	视图是虚表，存储的是查询语句，在使用的时候，通过定义视图的查询语句来获取结果集，视图本身并不存储数据，不会占用数据存储的资源，减少数据冗余
3. 数据安全
	只对外暴露一些安全的，不敏感的信息

视图的不足

不好维护
    如果实际数据表的结构变更了，我们就需要及时对相关的视图进行相应的维护。特别是嵌套的视图（就是在视图的基础上创建视图），维护会变得比较复杂， 可读性不好 ，容易变成系统的潜在隐患。因为创建视图的 SQL 查询可能会对字段重命名，也可能包含复杂的逻辑，这些都会增加维护的成本。

存储过程

写法
delimiter //
create procedure 方法名(参数列表)
begin
...
end //

call 方法名(参数列表)

存储过程的参数类型 in、out、inout
in、out、inout 可以在一个存储过程中带多个

变量

# 全局变量
SET GLOBAL 变量名=变量值;
#为某个会话变量赋值
SET SESSION 变量名=变量值;
# 存储过程中 局部变变量
DECLARE 变量名1 变量数据类型 [DEFAULT 变量默认值]; -- 只能在 begin end 中使用，且必须在第一行

例题

每日新用户的次日留存率

# 使用 窗口函数 
# min(date) over(partition by user_id) as min_date 找出 用户登录的最小日期 即为注册日期
# lead(date) over(partition by user_id order by date) as next_date 找出 用户下一次的登陆日期 
# 新用户 date = min_date and min_date != next_date -- 去除新注册同一天登陆多次的用户
# 在次日登陆的新用户 date = min_date and datediff(next_date,min_date) = 1
select
	t1.date,
	round(ifnull(
    		sum(case when date = min_date and datediff(next_date,min_date) = 1 then 1 else 0 end)
    	/ sum(case when date = min_date and min_date != next_date then 1 else 0 end),0),3) as p
from(
    select
        date,
        min(date) over(partition by user_id) as min_date,
        lead(date) over(partition by user_id order by date) as next_date
    from login
) as t1
group by
	t1.date
order by
	t1.date

每日登录新用户个数

# 方法一 
# 找出 新用户的注册日期 即登陆的最小日期 
# 当 用户的登陆日期 与 注册日期相同时 即为新用户
# 统计 每日 date = min_date 的总数
select
	t1.date,
	sum(case when datediff(date,min_date) = 0 then 1 else 0 end) as new_count
from(
    select
        date,
        min(date) over(partition by user_id) as min_date
    from
        login
)as t1
group by
	t1.date

# 方法二
# 使用 row_number() over() 找到用户最早的登陆日期
# 统计 每日 rn = 1 的总数
select
	t1.date,
	sum(case when t1.rn = 1 then 1 else 0 end) as new_count
from(
    select
        date,
        row_number() over(partition by user_id order by date) as rn
    from 
        login
) as t1
group by
	t1.date

新登录用户的次日成功的留存率

# 留存率(新注册用户次日也登陆)
select
	round(count(new_table.user_id) / (select count(distinct user_id) from login)),3) as p 
from(
    # 查看 那些用户在注册后一天登录
    select
        user_id
    from 
        login
    where (user_id,date) in(
            # 用户注册日期后一天
            select
                user_id,
                date_add(min(date),interval 1 day) as next_date
            from
                login
    )
) as new_table

查找人流量大于或等于100且至少连续三天的信息

# id岁日期的增加而增加
select
	id,
	visit_date,
	people
from(
    select
        id,
        visit_date,
        people,
        # 统计相同差值出现的次数,即连续的天数
        count(dif) over(partition by dif) as cnt
    from(
        select
            id,
            visit_date,
            people,
            # 连续天数的id和排行差是一样的
            id - row_number() orver(order by id) as dif
        from stadium
        where people >= 100
    ) as t1
) as t2
where t1.cnt >= 3

查找连续登陆5 天的用户，若登陆天数相差一天也算是连续登陆

##  找到用户当前登陆日期以及下一次登陆的日期 相减 得到的差值 ，用sum求差值进行求和 若差值<= 2 就不进行累加，反之则 +1 
select
	id
from(
    select
        id,
        sum(if(diff<= 2,0,1)) over(partition by id order by date) as cnt
    from(
        select
            id,
            date,
            ### lag取下一行的日期
            datediff(date,lag(date,1,date) over(partition by id order by date)) as diff
        from login
        ) as t1
    )as t2
group by
	id,
	cnt
having count(*) >= 5;

/*

-- 需求1,要取出按照时间轴顺序，发生了状态变化的数据，如：
+------+-----------+-----------+-------------+
| mid  | pre_rate  | new_rate  | newrate_dt  |
+------+-----------+-----------+-------------+
| 100  | 0.2       | 0.1       | 2021-03-02  |
| 100  | 0.1       | 0.2       | 2021-03-05  |
| 100  | 0.2       | 0.3       | 2021-03-07  |
| 100  | 0.3       | 0.1       | 2021-03-09  |
+------+-----------+-----------+-------------+
*/

select * from merchant_fee;
select mid,
       pre_rate,
       new_rate,
       newrate_dt
from (
         select mid,
                lag(rate, 1, rate) over (partition by mid order by dt) as pre_rate,
                rate                                                   as new_rate,
                dt                                                     as newrate_dt
         from merchant_fee
     ) as t1
where pre_rate != new_rate;

/*
-- 需求2,将如上数据表,显示成三个列
开始时间         结束时间        费率
2021-02-02    2021-03-01          0.2
2021-03-02    2021-03-04          0.1
2021-03-05    2021-03-06          0.2
2021-03-07    2021-03-08          0.3
2021-03-09    9999-12-31          0.1
*/
select start_time,
       nvl(date_sub(lead(start_time,1,null) over(partition by mid order by start_time),1),'9999-12-31') as end_time,
       rate
from (
         select mid,
                rate,
                min(dt) as start_time
         from (
                  select mid,
                         rate,
                         dt,
                         sum(if(diff_rate = 0, 0, 1)) over (partition by mid order by dt) as cnt
                  from (
                           select mid,
                                  rate,
                                  dt,
                                  rate - lag(rate, 1, rate) over (partition by mid order by dt) as diff_rate
                           from merchant_fee
                       ) as t1
              ) as t2
         group by mid,
                  rate,
                  cnt
         order by start_time
     )as t3;

/*
– 需求2,将如上数据表,显示成三个列开始时间结束时间费率 2021-02-02 2021-03-01 0.2 2021-03-02 2021-03-04 0.1
2021-03-05 2021-03-06 0.2 2021-03-07 2021-03-08
0.3 2021-03-09 9999-12-31 0.1
*/ select start_time,
nvl(date_sub(lead(start_time,1,null) over(partition by mid order by start_time),1),‘9999-12-31’) as end_time,
rate from (
select mid,
rate,
min(dt) as start_time
from (
select mid,
rate,
dt,
sum(if(diff_rate = 0, 0, 1)) over (partition by mid order by dt) as cnt
from (
select mid,
rate,
dt,
rate - lag(rate, 1, rate) over (partition by mid order by dt) as diff_rate
from merchant_fee
) as t1
) as t2
group by mid,
rate,
cnt
order by start_time
)as t3;