Przygody z oprogramowaniem
  • Start
  • Szkolenia
    • Szkolenia otwarte
    • Katalog szkoleń
  • Usługi
    • Konsulting
    • Mentoring
    • Research & Development
  • Blog
  • Wiedza
    • Strefa wiedzy
    • BFsharp
    • SaaS
  • Klienci
  • Kontakt
0

Conditional Aggregation

23 March, 2007-SQL

Problem

  • We want to count rows based on two distinct criteria.
  • We want to calculate 2 or more aggreagates over overlapping sets.

Setup script

USE tempdb;
CREATE TABLE t1
(
     Id INT IDENTITY PRIMARY KEY,
     Date datetime,
     Number int,
     Flag tinyint,
);

BEGIN TRAN;
SET NOCOUNT ON;
DECLARE @i INT;
SET @i = 0;
WHILE ( @i 

Sample 1

Return row count where Number3 = 1 and row count where Number3 = 2.
The simpliest solution is to write

SELECT COUNT(*) FROM t1 WHERE Flag = 1;
SELECT COUNT(*) FROM t1 WHERE Flag = 2;

However, this approach scans the base table 2x times, yielding excessive IO – 2x 324 page reads on my machine.

Sample 2

Sum Number column for the following time spans: after April, 2008; after April 2009; after April 2010.

SELECT SUM(Number) FROM t1 WHERE Date > '2008-04-30';
SELECT SUM(Number) FROM t1 WHERE Date > '2009-04-30';
SELECT SUM(Number) FROM t1 WHERE Date > '2010-04-30';

The optimalizer used index seek this time, however still the same data are scaned multiple times (193 + 118 + 44 page reads).

Solution

Our aim is to scan the required rows only once, regardless of how many aggregates we want to calculate.
What would happen if we query the data and conditionaly include them in aggregates?

Sample 1 – solution

SELECT COUNT(CASE WHEN Flag = 1 THEN 1 END) as 'Flag1',
     COUNT(CASE WHEN Flag = 2 THEN 1 END) as 'Flag2'
FROM t1

Only one table scan – 324 page reads.

Sample 2 – solution

SELECT SUM(CASE WHEN Date > '2008-04-30' THEN Number END),
     SUM(CASE WHEN Date > '2009-04-30' THEN Number END),
     SUM(CASE WHEN Date > '2010-04-30' THEN Number END)
FROM t1;

Now, data is scaned once with 274 page reads. However, we can optimalize it further. All we need to do is to restrict the set to the rows needed by the largest date range.

SELECT SUM(CASE WHEN Date > '2008-04-30' THEN Number END),
     SUM(CASE WHEN Date > '2009-04-30' THEN Number END),
     SUM(CASE WHEN Date > '2010-04-30' THEN Number END)
FROM t1
WHERE Date > '2008-04-30';

Great – 193 page reads.

Sumary

Conditional aggregation can be used to:

  • optimalize overlapping aggregates which normally would require multiple data passes.
  • non-overlaping aggregates grouped by a column with low selectivity.
  • create custom aggregates.

Kategorie

Architecture BFsharp Blog Business Framework C# CqRS DDD Debugging DSL EntityFramework Formula JavaScript Linq NHibernate SaaS Silverlight SQL Visual Studio WPF Wzorce

O mnie


RSS Feed

© macmichal.pl 2011 Wszystkie prawa zastrzeżone