测试环境是mysql,但实际上跟数据库环境没有关系,是逻辑上的问题。
1、建表及测试数据构造
2、查询数据
2.1、用ip关联取两者差集
2.2、用ip关联取两者交集
2.3、总体计算identify数据
3、说明
原始数据的查看:
1、建表及测试数据构造
drop tables if exists xxxxx_iqy0708;
create table xxxxx_iqy0708(ip varchar(100));
drop table if exists xxxxx_chusou0708;
create table xxxxx_chusou0708(appkey varchar(100), ip varchar(100), identify varchar(100));
insert into xxxxx_iqy0708
VALUES('xx1.xx2.xx3.xx4'),('xx2.xx2.xx3.xx4'),('xx3.xx2.xx3.xx4');
insert into xxxxx_chusou0708(appkey,ip,identify)
VALUES
('CSAndroid','xx1.xx2.xx3.xx4','id1'),
('CSAndroid','xx2.xx2.xx3.xx4','id2'),
('CSAndroid','xx5.xx2.xx3.xx4','id8'),
('CSAndroid','xx6.xx2.xx3.xx4','id3'),
('CSIos','xx1.xx2.xx3.xx4','id2'),
('CSIos','xx2.xx2.xx3.xx4','id1'),
('CSIos','xx3.xx2.xx3.xx4','id3'),
('CSIos','xx2.xx2.xx3.xx4','id3'),
('CSIos','xx5.xx2.xx3.xx4','id3');
2、查询数据
2.1、用ip关联取两者差集
select appkey,count(distinct identify)
from xxxxx_chusou0708 a1
left join xxxxx_iqy0708 a2 on a1.ip=a2.ip
where a2.ip is null
group by appkey;
--注:根据ip关联,则其对应的identify可能已经被关系上了,但也可能未被关联上;就是说一个identify对应多个ipappkey count(distinct identify)
CSAndroid 2
CSIos 1
2.2、用ip关联取两者交集
select appkey,count(distinct identify)
from xxxxx_chusou0708 a1
inner join xxxxx_iqy0708 a2 on a1.ip=a2.ip
group by appkey;
-- ip关联,对应的是关联上的identify;同上,就是说一个identify对应多个ipappkey count(distinct identify)
CSAndroid 2
CSIos 3
2.3、总体计算identify数据
select appkey, count(distinct identify)
from xxxxx_chusou0708
group by appkey;
-- 所有的去重,理论上,这个会小于等于2.1+2.3之和。appkey count(distinct identify)
CSAndroid 4
CSIos 3
3、说明
原始数据的查看:
select * from xxxxx_chusou0708;
select * from xxxxx_iqy0708;
如果严格关联,2.3=2.1+2.2;但本文所述情况例外,有时挺绕的,不仔细想想,搞不明白。