SQL小结

最新推荐文章于 2024-09-27 21:41:13 发布

xsc320643

最新推荐文章于 2024-09-27 21:41:13 发布

阅读量75

点赞数

文章标签： sql 数据库

本文链接：https://blog.csdn.net/xsc320643/article/details/131963010

版权

Sql的执行顺序：

from->where->group by->select->order by

Sql中distinct的使用，起到去重的作用；

Sql中有关group by分组的使用，核心就是分组后使用聚集函数，目前sql支持5种聚集函数，分别是count，sum，avg，min，max。需要注意的是select中除了聚集函数必须在group by中出现 having用来过滤分组，和where的区别就是where是用来过滤行。

union的使用需要两边的字段类型和个数相同，列名可以不同。

有关sql在线练习 http://sqlfiddle.com/#!9/5b40e4e/5

例子：

-- borrowed from https://stackoverflow.com/q/7745609/808921

CREATE TABLE IF NOT EXISTS `docs1` (

`id` int(6) unsigned NOT NULL,

`rev` int(3) unsigned NOT NULL,

`content` varchar(200) NOT NULL,

`content1` varchar(200) NOT NULL,

PRIMARY KEY (`id`,`rev`)

) DEFAULT CHARSET=utf8;

INSERT INTO `docs1` (`id`, `rev`, `content`,`content1`) VALUES

('1', '1', 'One hundred angels can dance on the head of a pin',''),

('2', '2', 'The earth is flat and rests on a bull\'s horn',''),

('3', '3', 'The earth is like a ball.','');

CREATE TABLE IF NOT EXISTS `docs2` (

`id` int(6) unsigned NOT NULL,

`rev` int(3) unsigned NOT NULL,

`content` varchar(200) NOT NULL,

PRIMARY KEY (`id`,`rev`)

) DEFAULT CHARSET=utf8;

INSERT INTO `docs2` (`id`, `rev`, `content`) VALUES

('1', '1', 'The earth is flat12'),

('1', '2', 'One hundred angels can dance on the head of a pin1212'),

('2', '3', 'The earth is flat and rests on a bull\'s horn1212'),

('3', '4', 'The s on a bull\'s horn1212'),

('4', '5', 'The earth is like a ball12121212.');

CREATE TABLE IF NOT EXISTS `table1` (

`mobile` varchar(200) NOT NULL,

`mac` varchar(200) NOT NULL,

PRIMARY KEY (`mobile`)

) DEFAULT CHARSET=utf8;

INSERT INTO `table1` (`mobile`, `mac`) VALUES

('mobile_1', 'mac_1'),

('mobile_2', 'mac_2'),

('mobile_3', 'mac_3');

CREATE TABLE IF NOT EXISTS `temp2` (

`mac` varchar(200) NOT NULL,

`id` varchar(200) NOT NULL,

`time` varchar(200) NOT NULL

) DEFAULT CHARSET=utf8;

INSERT INTO `temp2` (`mac`, `id`,`time`) VALUES

('Mac_1', '1','T1'),

('Mac_1', '2','T1'),

('Mac_1', '3','T1'),

('Mac_1', '5','T1');

CREATE TABLE IF NOT EXISTS `table2` (

`mac` varchar(200) NOT NULL,

`id` varchar(200) NOT NULL,

`time` varchar(200) NOT NULL

) DEFAULT CHARSET=utf8;

INSERT INTO `table2` (`mac`, `id`,`time`) VALUES

('Mac_1', '1','T1'),

('Mac_2', '1','T1'),

('Mac_3', '1','T2'),

('Mac_1', '2','T1'),

('Mac_2', '2','T2'),

('Mac_3', '2','T1'),

('Mac_1', '3','T1'),

('Mac_2', '3','T1'),

('Mac_3', '3','T2'),

('Mac_4', '4','T2'),

('Mac_1', '5','T1'),

('Mac_2', '5','T1');

------------------

table 1

A	B	C
A1	B1	C1
A2	B2	C2
A3	B3	C3

table 2

B	C	D
B1	C5	D1
B2	C6	D2

有关关联，核心点是首先进行笛卡尔积，列出所有表的字段

select * from table1 a join table2 b on a.B=b.B

得出的结果：

A	B	C	B1	C1	D
A1	B1	C1	B1	C5	D1
A2	B2	C2	B2	C6	D2

采用左连接是保留左侧表的内容

select * from table1 a left join table2 b on a.B=b.B

得出的结果：

A	B	C	B1	C1	D
A1	B1	C1	B1	C5	D1
A2	B2	C2	B2	C6	D2
A3	B3	C3	null	null	null

有关union

经常出现的语句的错误就是直接进行两表union，正确写法：

select B,C from table1 union select B,C from table2;

得出的结果：

B	C
B1	C1
B2	C2
B3	C3
B1	C5
B2	C6

基于上面的结果可以进行聚合；

select a.B,count(a.B) from

(select B,C from table1 union select B,C from table2

) a group by a.B;

5、形成一个小结，join就是表基于条件进行横向扩展；union是表结构不变纵向扩展。

同时常见的有聚合统计，利用groupby取统计值，orderby去极值；

条件中考虑批量用in，但如果in使用的list过大性能上比不过join。

6、Mysql中WhereIn和Join的性能比对

在mysql中使用whereIn和Join表性能区别，在查询多表的关系的时候，存在两种查询方法。

一种是找出对应关系的ID，然后根据对应关系的ID的集合，到目标表中查询出结果。

另外一种，是联立两张表格，进行join查询。之前项目中选择一个使用了where in 的条件，但是where in 的 list 如果太大，就会导致查询异常很慢，通常如果list在1000以内，都能在100ms内完成检索，但是list大于5000的时候，基本上查询时间接近1s秒钟。 (这一切都是建立在使用索引，并且是聚集索引的情况下)

如果直接使用join来处理两张表，那么只要筛选完条件，查询基本上都能够限制在100ms内完成。

使用where in的时候，能够使用的索引，但是，由于查询的where in 条件，每一个条件都需要扫描表，所以相对来说要慢很多。

******************************************************

实战中的例子解析：

Wifi伴随人员发现

1、通过A类人员手机号，找到对应的mac；

2、查询特定时间范围内（A类人员发病时间-14天至确诊时间+确诊时间-发病时间）所有出现的围栏编号和采集时间，假设每条数据形成一个轨迹，形成N条记录；

3、根据每个轨迹点，计算特定时间内（假设前后5分钟）该场所出现的所有手机mac，共计可以形成N组手机mac数据集合；

4、针对3中形成的mac数据集合，首先进行数据去重，再次分别对N组mac数据（譬如N1{mac1_1，mac1_2，mac2_1,mac2_2...},...共计N组），采取数据聚合，获取每组mac出现次数，根据次数采取降序排列；

5、针对4步中数据聚合后的数据，获取固定阈值以上的mac数据（这里的阈值可以取固定值，譬如3或者N/10以及N/100的较大值）；

6、根据5中获取的mac值，找到对应的手机号；

7、通过手机号，找出B类人员其他档案信息；

解答：如何通过表格sql的方式进行：提前需要准备几张表，一张是手机号和mac的表table1；

Mac和电围采集设备的表table2；

table1:

mobile	mac
Mobile_1	Mac_1
Mobile_2	Mac_2
Mobile_3	Mac_3
Mobile_4	Mac_4

table2:

mac	id	time	备注
Mac_1	1	T1	T1代表满足采集时间范围内，T2代表不满足采集时间范围内；
Mac_2	1	T1
Mac_3	1	T2
Mac_1	2	T1
Mac_2	2	T2
Mac_3	2	T1
Mac_1	3	T1
Mac_2	3	T1
Mac_3	3	T2
Mac_4	4	T2	代表mac_1没有出现的id
Mac_1	5	T1
Mac_2	5	T1

步骤一：通过A类人员手机号Mobile_1,找到对应的mac，得到中间表temp1:

select * from table1 a where a.mobile in (‘Mobile_1’);

temp1:

mobile	mac
Mobile_1	Mac_1

步骤二：找出Mac_1出现的采集设备id满足时间的所有mac地址信息；

首先找到mac为Mac_1的所有种子数据temp2：

select * from table2 a where a.mac in (‘Mac_1’) ;

temp2:

mac	id	time
Mac_1	1	T1
Mac_1	2	T1
Mac_1	3	T1
Mac_1	5	T1

其次利用temp2中获取的id，得到Mac1出现同时时间满足条件范围T1的所有mac地址信息；

一种筛选的写法：

select * from table2 a where a.id in (select * from table2 a where a.mac in (‘Mac_1’) and a.time in (‘T1’) and a.mac not in (‘Mac_1’);

另外一种关联的写法：

select a.* from table2 a join (select * from table2 a where a.mac in('Mac_1'))b

on (a.id=b.id and a.time in ('T1') and a.mac not in ('Mac_1'));//这里是否会首先产生笛卡尔积，影响性能？

得到步骤二中的结果表temp3：

mac	id	time
Mac_2	1	T1
Mac_3	2	T1
Mac_2	3	T1
Mac_2	5	T1

步骤三，针对temp3，通过mac进行统计，获取次数大于等于3次的mac

select a.mac,count(*) count from temp3 a group by a.mac having count>2;

mac	count
Mac_2	3

%%%%%%%%%%%%%%%

有关sql算子的总结：

编号	算子	备注
	数据关联join
	分组函数group by， partition by	注意两者的区别，group by保留一行，partition by保留行数另外增加一个字段。
	窗口函数rank() over(partition by 某字段 order by 某字段)