Islands
3 October, 2007-SQL
The task for today is to get existing continuous ranges – islands.
Setup script
USE tempdb; CREATE TABLE T1 ( a INT ); BEGIN TRAN SET NOCOUNT ON; DECLARE @i INT; SET @i = 1; WHILE ( @i <= 1000 ) BEGIN INSERT t1 VALUES ( @i ); SET @i = @i + 1; END; DELETE t1 WHERE a in ( 4, 5, 7, 20, 21 ); SET NOCOUNT OFF; COMMIT;
Output
There are 4 ranges: 1-3, 6, 8-19, 22-1000.
Solution
We want to assign a unique value for each range so that we can separate islands. How to generate this unique value? What’s the difference between two equally rising sequences? It’s constant 🙂 so let’s use row_number.
SELECT a, a - ROW_NUMBER() OVER ( ORDER BY a ) AS R FROM t1;
You now know what to do next. Aggregate on the newly created value and than simply select MIN and MAX which are the start and the end of a range respectively.
SELECT MIN(a) AS 'start', MAX(a) AS 'end' FROM ( SELECT a, a - ROW_NUMBER() OVER ( ORDER BY a ) AS R FROM t1 )t GROUP BY R
Plan
The plan for this query is pretty linear which is good, one table scan.
SELECT MIN(a) AS 'start', MAX(a) AS 'end' FROM ( SELECT a, a - ROW_NUMBER() OVER ( ORDER BY a ) AS R FROM t1 ) t GROUP BY R |--Hash Match(Aggregate, HASH: ([Expr1005]), RESIDUAL: ([Expr1005] = [Expr1005]) DEFINE: ([Expr1006]=MIN([tempdb].[dbo].[T1].[a]), [Expr1007]=MAX([tempdb].[dbo].[T1].[a]))) |--Compute Scalar(DEFINE: ([Expr1005]=CONVERT_IMPLICIT(bigint,[tempdb].[dbo].[T1].[a],0)-[Expr1004])) |--Sequence Project(DEFINE: ([Expr1004]=row_number)) |--Compute Scalar(DEFINE: ([Expr1011]=(1))) |--Segment |--Sort(ORDER BY: ([tempdb].[dbo].[T1].[a] ASC)) |--Table Scan(OBJECT: ([tempdb].[dbo].[T1]))