最近爬某数据,为了效率用了线程,但是偶尔连接缺失,所以爬下来的结果中会有缺失的数据
这里希望找出缺失的序号,再次爬数据`
例如表名为target_table,希望查缺失的列叫_seq
select * from (
select x._seq - y._seq as shift,x._seq,y._seq from
(select 1+ROW_NUMBER() over(order by _seq desc ) as _id_,_seq from target_table order by _seq desc ) as x,
(select ROW_NUMBER() over(order by _seq desc ) as _id_,_seq from target_table order by _seq desc ) as y
where x._id_ = y._id_ ) as z
where shift <> 1`
逻辑就是先把需要找的序列_seq排序,同时加上一个row_number作为索引,然后同样生成另一个表,row_number + 1
然后把两张表用row_number拼接,然后用where筛选 _seq相减不等于1的值
效率应该还能接受,测试数据64w条,跑了3s多