SQL语法

weixin_44133552

已于 2022-08-25 18:20:44 修改

阅读量1.1k

点赞数 1

文章标签： sql 数据分析

于 2022-04-24 14:41:47 首次发布

本文链接：https://blog.csdn.net/weixin_44133552/article/details/124119299

版权

SQL查询语句大全

关键字总览
常用的函数
编码原则和规范
SQL优化
案例

关键字总览

1.什么是DQL？

数据库语言主要分为查询语言DQL、数据操纵语言DML、数据定义语言DDL、数据控制语言DCL
查询语言DQL：
SELECT <字段名表> FROM <表或视图名> WHERE <查询条件>
数据操纵语言DML：
1)插入：INSERT
insert into tmp (medid,searchid) values (12345,54321);
2)更新：UPDATE
update tmp set searchid=’1’ where medid=’12345’;
3)删除：DELETE
delete from tmp where medid=’12345’;
数据定义语言DDL
数据定义语言DDL用来创建数据库中的各种对象—–表、视图、索引、同义词、聚簇等如：
CREATE TABLE/VIEW/INDEX/SYN/CLUSTER(表视图索引同义词簇)
数据控制语言DCL
数据控制语言DCL用来授予或回收访问数据库的某种特权，并控制数据库操纵事务发生的时间及效果，对数据库实行监视等。如：
1) GRANT：授权。
2) REVOKE：撤销授权。

2.DQL表达式有哪些

子句	描述	子句	描述
select	要返回的列和表达式	like	模糊匹配
from	从中检索数据的表	between and	连续数值范围限定
left join	返回左表所有行	where	后面增加筛选条件
right join	返回右表所有行	逻辑运算符	and、or、not
inner join	返回两表交集	比较运算符	=、<>、<、>、！=
join	笛卡尔积，不需要on主键	in	非连续值筛选
fulljoin	所有集合，包含重复数据	空值筛选	is null、is ‘’、is not null
on	关联条件，用join是才用	if(expr,1,0)	条件判断
case when	条件判断	group by	分组函数
having	对分组后的结果进行条件过滤，与group by	union	表纵向拼接
order by	排序函数desc倒序	limit	限制输出行数

在这里插入图片描述

3.编写&执行顺序

编写顺序
在这里插入图片描述
执行顺序

常用的函数

数据去重
方法一：使用distinct，后面全部条件相同才会去重
方法二：group by 去重，使用分组后，字段需要添加聚合函数max（）min（）等
方法三：窗口函数row_number()，使用distinct去重

聚合函数
聚合函数有count（）、max()、min()、sum()、avg(),如果select 后字段里没有非聚合字段，可以不使用group by

范围查询
连续值使用between num1 and num2 包含两个边界数值
非连续使用in （‘a’,‘b’,‘v’）

Json数据处理
mysql获取key中的value

SELECT JSON_EXTRACT(`测试表`.jason, '$.EngSpd') FROM 测试表 WHERE`测试表`.jason LIKE '%EngSpd%' AND `测试表`.jason !='NULL'

mysql获取json数组中的value

SELECT `测试表`.jason,JSON_EXTRACT(`测试表`.jason, '$.EngSpd') ,JSON_EXTRACT(`测试表`.jason, '$.EngSpd[2]') FROM 测试表 WHERE`测试表`.jason LIKE '%EngSpd%' AND `测试表`.jason !='NULL'
key值后加[1]位置即可

Hive获取key中的value

get_json_object(ten.data_args, '$.value') !='-1.0'
获取复杂嵌套json，一层一层提取即可
get_json_object(get_json_object(get_json_object(get_json_object(data_args,'$.nlu_result'),'$.sys_light_string'),'$.nlu_result[0]'),'$.domain'))

窗口函数
窗口函数

条件筛选

select a.StudentID,
 (case a.IsUse 
 when '0' then '未在线' 
 when '1' then '在线' 
 else '未上传'
 end) as 在线情况
from StudentBindPaperTypeEntity as a
必须要包含end

CASE WHEN minute(time_stamp) >=30 THEN hour(time_stamp) + 1
                    WHEN minute(time_stamp) <30 THEN hour(time_stamp)
               END AS hr

#case要判断的字段，when直接比较值
case mod(row_number() over(partition by uid, date_sub(sign_dt, interval rn day) order by sign_dt), 7)          when 3 then 3 -- 2+1 额外领 2            when 0 then 7 -- 6+1 额外领 6          else 1      end) as daily_coin

if(EXPR,A,B)--符合条件为A，不符合为B

格式转换
hive时间戳转化为北京时间，需要注意转换成北京东八时区，特别是计算早高峰晚高峰避免错误

from_utc_timestamp(from_unixtime(cast(cast(data_stamp_time as BIGINT)/1000 as BIGINT)),'PRC')acquisition_time1

mysql任意时间格式转换

to_timestamp("substring"(report_tid, 0, 13), 'YYMMDDHH24MISS')

mysql date_format(),后面自定义日期格式

 select      concat(year(submit_time),substring(submit_time,6,2)) as month
	    ,round(count(distinct uid,date_format(submit_time,'%y%m%d'))/count(distinct uid),2) as avg_active_days
                       ,count(distinct uid) as  mau
            from        exam_record
            where       submit_time is not null
                        and year(submit_time)>2020
            group by    month

hive 取当前日期月份最后一天，便于计算当月天数

SELECT vin
     ,last_day(from_utc_timestamp(from_unixtime(cast(cast(data_stamp_time AS BIGINT)/1000 AS BIGINT)),'PRC'))
FROM hdfs.ods_hu_track_hi
where dt='2022-02-16'

hive字符串转自定义日期格式

from_unixtime( unix_timestamp(substr(tid,1,12),'yyMMddHHmmss'),'yyyy-MM-dd HH:mm:ss')

日期格式处理
求日期差 DATEDIFF(max_time,start_time)<=29 2021-10-01 10:00:00-2021-10-01 10:00:20
日期取月份

日期取间隔
使用date_sub函数（Subtract减去，还有add增加）

date_sub('2019-07-27', interval 30 day) 取前三十天
date_sub('2019-07-27', 2)日期减少2
hive> select date_sub('2019-06-25',-2);2019-06-27

mysql 日期求差和秒求差

date_add((select max(end_time) from tb_user_video_log),interval -29 day)  计算小于29天
timestampdiff(second,start_time,end_time)  计算日期的秒数差

四舍五入
保留小数

 round(cast(amap_lat AS FLOAT),2)amap_lat

向下取整

select floor(103.56) from dual ---103

类型转换

cast(str as int/float)  --str如果是带小数点，则转成float，否则报错

字符串截取函数

select Left(‘Welcome to China!’,7)
返回：Welcome
select Right(‘Welcome to China!’,6)
返回：China!
select SubString(‘Welcome to China!’,9,2)
返回：to

注意：substring（‘字段’，起始，长度）最后一位为长度
另外substring_index函数也可以字符串切割，包含三个参数（‘字段’，‘切割方式’，从哪个位置切割）
从哪个位置切割1正向，2是前两个，-1是反向
如何取中间，使用两个sunbtring_index方法

count（）
count（1）和count（*）会记录null记录数，count（列名）会过滤掉空值

传参
可以使用’${ }'进行传参，比如日期范围

WHERE dt>'${dt}'
and   dt<'${dt}'

union&union all
union主要实现表与表之间的纵向连接，需要保证字段数量一致，区别如下：
union all执行速度大于union
union all不会对重复数据去重，union会去除重复数据

字段合并函数
实现横向多个字段合并concat(str1,str2,)
实现纵向多行某个字段合并GROUP_CANCAT(字段 SEPARATOR ‘，’)，必须与group by搭配
hive 默认间隔是‘，’直接distinct 即可

limit任意行取值

limit 2，5 --取值2到5行，最好进行排序
limit 99999 offset 0;  取99999条，从第0行开始

join
无论是left join还是inner join 如果一对多，左表会生成多行，而不是左表行数不变
join分为等值join和不等值join
不等值join可以使用
在这里插入图片描述
或是模糊查询

hivesql建议使用locate函数，用户模糊查询
字符串比较
字符串日期和数据可以直接比较，因为比较的是每位的ascII码

group by 1
group by 1代表select 后面第一列进行分组

regexp正则表达式

app_name regexp 'wey|haval|tank|ora|Poer'  --正则表达式或关系，类似 app_name in （'wey','haval'）

trunc
trunc（number，x）对数字进行截取

select TRUNC(15.79,1) from dual;TRUNC(15.79,1)
--------------
15.7

trunc （date，‘DD’）DD代表今天0点 d代表本周第一天，mm代表本月第一天，hh代表小时
trunc使用

GROUP_CONCAT函数+SPLIT_PART函数
实现多行json的拼接，SPLIT_PART进行数据的切分SPLIT_PART（col,‘sep’,N）设置分隔符和切割第几个

lateral view+explode
实现列转多行，将原本一行的数据拆分成多行的虚拟表，再与原表进行笛卡尔积，从而形成明细表
split将一行按照分隔符拆分成数组

select col_A,col_B,tmp_table.tmp_col 
from test_table 
lateral view explode(split(col_C,'分隔符')) tmp_table as tmp_col
where partition_name='xxx';

编码原则和规范

阿里规范
编码原则：
1.执行速度优先
2.SQL关键字、保留字采用全大写或小写
3.增加适当的注释
4.四个空格为一个缩进量，缩进为倍数
5.对应的括号在同一列的位置上
6.禁止select*操作，指定列名
编码规范
1.代码头部增加主题、功能描述、作者和日期等信息

**所属主题
**功能描述
**创建者
**创建日期
**修改日志
**修改人

2.select、from、groupby、having、orderby在同一行
3.select第一行带字段，后面依次换行，逗号在前，as并排
4.算数和逻辑运算符前面加空格
5.case语句 when并排，必须增加else
6.子查询规范，后面可以用a,b,c,d,e或s1,s2,s3,s4,括号在同一列，子查询缩进一个空格
7.每条语句进行注释，跟在字段后面或者在前面
在这里插入图片描述

SQL优化

待定

案例

计算连续两天签到的数量

    select count(distinct user_id)
    from (
        select  user_id,mid_date,count(1)  as cnt
        from    (
                select *,date_sub(dt,num) as mid_date--日期减去排序号，相同则代表连续
                from    
                        (
                        select user_id,dt,row_number() over (partition by user_id order by dt) as num  --分用户增加排序数字
                        from   dwd_point_detail_di
                        where  dt<='${dt}' 
                        and port='HJ0002'
                        )a
                )b
        group by user_id,mid_date)c
        where cnt=2--只有连续天数数量为2的保留

计算每天的活跃留存

select x.dt
,count(distinct x.device_id) dau
,count(distinct case when datediff(x.dt2,x.dt)=1 
then x.device_id2 end) day__2
,count(distinct case when datediff(x.dt2,x.dt)=2
then x.device_id2 end) day__3
,count(distinct case when datediff(x.dt2,x.dt)=3
then x.device_id2 end) day__4
,count(distinct case when datediff(x.dt2,x.dt)=4
then x.device_id2 end) day__5
,count(distinct case when datediff(x.dt2,x.dt)=5
then x.device_id2 end) day__6
,count(distinct case when datediff(x.dt2,x.dt)=6
then x.device_id2 end) day__7
from(
    select k1.dt,k1.device_id,k2.dt as dt2,k2.device_id as device_id2
    from 
        (--统计日期当天的用户id
        SELECT distinct left(cast(from_utc_timestamp(from_unixtime(cast(left(client_time,10) as BIGINT)),'PRC')as string),7) as 'dt',device_id  FROM *
        where app_name like '%${brand}%'
        and dt between '${start}' and '${end}'
        and left(cast(from_utc_timestamp(from_unixtime(cast(left(client_time,10) as BIGINT)),'PRC')as string),10) between '${start}' and '${end}'
        )k1
    left join
        (--之后又来的用户id
        SELECT distinct left(cast(from_utc_timestamp(from_unixtime(cast(left(client_time,10) as BIGINT)),'PRC')as string),10) as 'dt',device_id  FROM hdfs.ods_mb_track 
        where app_name like '%${brand}%' 
        and dt between '${start}' and '${end}'
        and left(cast(from_utc_timestamp(from_unixtime(cast(left(client_time,10) as BIGINT)),'PRC')as string),10) between '${start}' and '${end}'
        )k2 on k1.dt<k2.dt and k1.device_id = k2.device_id --匹配条件：前后id相同，dt晚于统计日dt
    )x
 group by x.dt order by x.dt;