Find sequence gaps using Oracle's analytic functions

refer:

http://www.builderau.com.au/program/oracle/soa/Find-sequence-gaps-using-Oracle-s-analytic-functions/0,339028441,339282747,00.htm

By Bob Watkins, TechRepublic | 2007/10/11 15:42:02

Sequences of values occur frequently in database tables. Whether generated naturally as part of the data, or assigned by an Oracle sequence object, the intent is to number the rows with a unique, ever-increasing number. Gaps in this numbering may occur for various reasons.

If a stored procedure selects from a sequence into a local variable, but never uses that number, it is lost. It cannot be returned to the sequence. The result is that the sequence of numbers will have a gap in it. The relational database model doesn't care about this, but sometimes people do, and those people may want to know which numbers are missing.

The "obvious" way to do this is with PL/SQL: loop through a sorted cursor and, using a local variable for storage, compare each value to the one in the row preceding it. The problem with this solution is that it is inefficient and quite slow. A less obvious approach is to do a self-join on the table, in essence matching each row with the one preceding it based on some criteria. This can be tricky to code.

Oracle's analytic functions provide a much cleaner way to do gap checking. These allow you to see the value in an upcoming row (LEAD) or previous row (LAG), while still using the full set-oriented processing of SQL.

Here is the format of these functions:

{LEAD | LAG} (value_expression, offset, default) OVER ([PARTITION BY expr] ORDER BY expr)

The value_expression is typically a column whose value you want to retrieve. The offset parameter is how many rows ahead or behind you want to read, and the default parameter is the value to return if either end of the partition is reached (and therefore there is no matching row).

The scenario in Listing A is that a series of data points was collected automatically from sensors on the factory floor. The sensor device automatically numbers each measurement, and we want to find if any were missed.

Listing A

SQL> CREATE TABLE sensor_data 2 (sensor_id VARCHAR2(6), 3 measurement_id NUMBER(9), 4 measurement_value NUMBER(6,3), 5 measurement_datetime DATE 6 ); Table created. SQL> INSERT INTO sensor_data 2 VALUES ('UNIT1',1,1.234,SYSDATE); 1 row created. SQL> INSERT INTO sensor_data 2 VALUES ('UNIT1',2,1.240,SYSDATE); 1 row created. SQL> INSERT INTO sensor_data 2 VALUES ('UNIT1',4,1.237,SYSDATE); 1 row created. SQL> INSERT INTO sensor_data 2 VALUES ('UNIT1',5,1.240,SYSDATE); 1 row created. SQL> INSERT INTO sensor_data 2 VALUES ('UNIT1',7,1.235,SYSDATE); 1 row created. SQL> COMMIT; Commit complete. SQL> SELECT * FROM sensor_data; SENSOR MEASUREMENT_ID MEASUREMENT_VALUE MEASUREME ------ -------------- ----------------- --------- UNIT1 1 1.234 20-JUN-06 UNIT1 2 1.24 20-JUN-06 UNIT1 4 1.237 20-JUN-06 UNIT1 5 1.24 20-JUN-06 UNIT1 7 1.235 20-JUN-06

In Listing B, we use the LAG function with an offset of 1 to match up each row with the one immediately before it, ordering by measurement_id. The first row will not match anything, so a default value of 0 is returned instead. Since a gap can't occur on the first row, we eliminate it by requiring a prior measurement > 0.

Listing B

SQL> WITH aquery AS 2 (SELECT measurement_idafter_gap, 3 LAG(measurement_id,1,0) OVER (ORDER BY measurement_id) before_gap 4 FROM sensor_data) 5 SELECT 6 before_gap, after_gap 7 FROM 8 aquery 9 WHERE 10 before_gap != 0 11 AND 12 after_gap - before_gap > 1 13 ORDER BY 14 before_gap; BEFORE_GAP AFTER_GAP ---------- ---------- 2 4 5 7

What remains is a set of virtual rows that show the previous row's measurement_id ("before_gap") and the current row's measurement_id ("after_gap"). If the difference between the two is greater than 1, a gap has occurred.

Also, notice that the analytical subquery is named "aquery" by the WITH clause at the beginning of the query. We can then refer to the columns in "aquery" in the main SELECT statement's WHERE clauses.