Windowed Functions in SQL Server

February 6, 2022 ยท View on GitHub

Pre-requisite: you are expected to be familiar with SQL, including GROUP BY and Aggregates. There is a CTE.

Part 1: Over and Over

We all know that this will fail, and why...

Select top 3
	VehicleCode,
	COUNT(*) as [COUNT]
from Reporting.Vehicles

Error Column 'Reporting.Vehicles.VehicleCode' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

You can't mix aggregates and non aggregates, unless you use GROUP BY. Or can you!?

Watch what happens when we include an OVER() clause...

Select top 3
       VehicleCode,
       COUNT(*) OVER() as [COUNT]
from Reporting.Vehicles
VehicleCodeCOUNT
ABC_VH0118214
ABC_VH0128214
ABC_VH0138214

Often the only place we're familiar with OVER() from is with the ROW_NUMBER() function. Honestly, most of us treat ROW_NUMBER() as a little bit of magic text that we copy and paste from stackoverflow each time we want to use it.

Let's follow the same process, but instead of the count function we'll start with a ROW_NUMBER()...

Select top 3
       VehicleCode,
       ROW_NUMBER() as ROW_NUMBER
from Reporting.Vehicles

Error The function 'ROW_NUMBER' must have an OVER clause.

Now we're told that The function 'ROW_NUMBER' must have an OVER clause.

So let's include an OVER clause...

Select top 3
       VehicleCode,
       ROW_NUMBER() OVER() as ROW_NUMBER
from Reporting.Vehicles

Error The function 'ROW_NUMBER' must have an OVER clause with ORDER BY.

Now we're heckled with "The function 'ROW_NUMBER' must have an OVER clause with ORDER BY" and I don't know they didn't say that in the first place?

(And you noticed that the COUNT function above did not require an ORDER BY clause. That's because ROW_NUMBER and COUNT operate under different rules.)

So we'll add that in...

Select top 3
	   VehicleCode,
	   ROW_NUMBER() OVER(order by VehicleCode) as ROW_NUMBER
from Reporting.Vehicles
VehicleCodeROW_NUMBER
ABC_VH0111
ABC_VH0122
ABC_VH0133

And voila! We have our ROW_NUMBER... but let's learn more about what's happening here.

First we need to see exactly what the rules are.

Can you guess what will happen if we order by DESC ?

Select top 3
	   VehicleCode,
	   ROW_NUMBER() OVER(ORDER BY VehicleCode desc) as ROW_NUMBER
from Reporting.Vehicles

Will we now get 3,2,1 ?

VehicleCodeROW_NUMBER
YAX_VH0111
WYX_VH0122
WNX_VH0133

No, we still got 1,2,3 -- but the top 3 Vehicles we were shown were from the end of the list of Vehicles, in descending order.

This takes a little bit of thinking. Since there's no sort order on the outer query, the ordering created by the rest of the query is apparent. And it shows us a little bit about the way in which the ROW_NUMBER was applied. They started applying from the back of the list, i.e. starting at 1.

What if we order the outer query by VehicleCode ascending... will we now get 3,2,1 ?

Select top 3
	   VehicleCode
	   ,ROW_NUMBER() OVER(ORDER BY VehicleCode desc) as ROW_NUMBER
from Reporting.Vehicles
order by VehicleCode asc
VehicleCodeROW_NUMBER
ABC_VH0118214
ABC_VH0128213
ABC_VH0138212

Can we order by an integer?

Select top 3
	   VehicleCode
	   ,ROW_NUMBER() OVER(ORDER BY 1 asc) as ROW_NUMBER
from Reporting.Vehicles
order by 1 asc

Error Windowed functions and NEXT VALUE FOR functions do not support integer indices as ORDER BY clause expressions.

Can we order by a Windowed Function?

Select top 3
	   VehicleCode,
	   ROW_NUMBER() OVER(ORDER BY VehicleCode asc) as ROW_NUMBER
from Reporting.Vehicles
order by ROW_NUMBER() OVER(ORDER BY VehicleCode desc) asc
VehicleCodeROW_NUMBER
YAX_VH0018214
WYX_VH0018213
WNX_VH0018212

Can we put a windowed function into the OVER clause???

Select top 3
	VehicleCode,
	ROW_NUMBER() OVER(ORDER BY
		ROW_NUMBER() OVER(ORDER BY VehicleCode desc) asc
		) as ROW_NUMBER
from Reporting.Vehicles

Error Windowed functions cannot be used in the context of another windowed function or aggregate.

Can we put a windowed function in a WHERE clause?

Select top 3
	   VehicleCode,
	   ROW_NUMBER() OVER(ORDER BY VehicleCode asc) as ROW_NUMBER
from Reporting.Vehicles
where ROW_NUMBER() OVER(ORDER BY VehicleCode asc) > 4

Error Windowed functions can only appear in the SELECT or ORDER BY clauses.

At this stage we've learnt all the rules.

We can follow this old skating maxim:

FIRST: Learn all the rules

NEXT: Break all the rules

Here's how we CAN put a windowed-function into a WHERE clause...

;with TopVehicles AS
(Select
	  VehicleCode,
	  ROW_NUMBER() OVER(ORDER BY VehicleCode asc) as ROWNUM
from Reporting.Vehicles)
select top 3 * from TopVehicles
where ROWNUM > 4

We use a Common-Table-Expression (CTE) to get one extra level of indirection... now we can filter by our windowed-function.

VehicleCodeROWNUM
ABC_VH0015
ABC_VH0026
ABC_VH002R7

In fact this is a handy general technique.

You can also apply a windowed function column to a windowed function, if you first wrap it in a CTE.

This technique creates a kind of CTE game of Pass-the-Parcel. You may need to wrap up many levels of CTE to get the result you need.

cte pass the parcel

PART 2: Other Functions

We've finished exploring COUNT and Row_Number functions for now.

A quick look at other functions.

I said above that "ROW_NUMBER and COUNT operate under different rules" -- that's because they are different types of functions.

Y'see, with over() there are two types of functions you can use:

  1. ALL of the aggregates (e.g. COUNT, SUM, AVG) which you know from GROUPing
  2. "Ranking functions"

What's an example of a Ranking function?

An obvious example would be the RANK function itself, which we'll get to in a moment.

But ROW_NUMBER is also a ranking function. It's just a very unfair ranking

Select top 5
	   VehicleCode,
	   ReliabilityFactor,
	   RANK() OVER(ORDER BY ReliabilityFactor desc) as RANK_ReliabilityFactor,
	   ROW_NUMBER() OVER(ORDER BY ReliabilityFactor desc) as ROW_NUM
from Reporting.Vehicles
where ReliabilityFactor is not null and ReliabilityFactor <= 3.50
VehicleCodeReliabilityFactorRANK_ReliabilityFactorROW_NUM
ARQ_VH1293.511
MJQ_VH1663.512
IBV_VH1673.4352500438690233
JMX_VH1493.4000000953674344
MJQ_VH0063.4000000953674345

The different ranking functions each use a different 'strategy' for numbering.

In the plain old RANK strategy, if two rows are "tied" then they get the same number. And a subsequent number is skipped.

Rank is like the olympics: two people can share a gold medal, but then no one gets the silver.

A different strategy is DENSE_RANK:

Select top 5
	   VehicleCode,
	   ReliabilityFactor,
	   COUNT(*) over () as [Count],
	   RANK() OVER(ORDER BY ReliabilityFactor desc) as RANK_ReliabilityFactor,
	DENSE_RANK() OVER(ORDER BY ReliabilityFactor desc) as DENSE_RANK_ReliabilityFactor,
	   ROW_NUMBER() OVER(ORDER BY ReliabilityFactor desc) as ROW_NUMBER_ReliabilityFactor
from Reporting.Vehicles
where ReliabilityFactor is not null and ReliabilityFactor <= 3.50
VehicleCodeReliabilityFactorCountRANK_ReliabilityFactorDENSE_RANK_ReliabilityFactorROW_NUMBER_ReliabilityFactor
ARQ_VH1293.53085111
MJQ_VH1663.53085112
IBV_VH1673.435250043869023085323
JMX_VH1493.400000095367433085434
MJQ_VH0063.400000095367433085435

With dense rank, if two people tie for first they both get a gold medal. And the next person gets a silver.

This way you know every type of number will be handed out, at least once, even though ties are allowed.

So it's more fair than 'row number' but has different numeric properties. So it can be useful depending on how you want to join it to other things.

PART 3: Aggregates

The famous MAX and MIN

Select top 5
	   VehicleCode,
	   ReliabilityFactor,
	   ROW_NUMBER() OVER(ORDER BY ReliabilityFactor desc) as ROW_NUMBER_ReliabilityFactor,
	   MAX(ReliabilityFactor) OVER() as MAX_RF,
	   MIN(ReliabilityFactor) OVER() as MIN_RF
from Reporting.Vehicles
where ReliabilityFactor is not null and ReliabilityFactor <= 3.50
VehicleCodeReliabilityFactorROW_NUMBER_ReliabilityFactorMAX_RFMIN_RF
ARQ_VH1293.513.50
MJQ_VH1663.523.50
IBV_VH1673.4352500438690233.50
JMX_VH1493.4000000953674343.50
MJQ_VH0063.4000000953674353.50

What if we specify an order by!?

Select top 5
	VehicleCode,
	ReliabilityFactor,
	ROW_NUMBER() OVER(ORDER BY ReliabilityFactor desc) as ROW_NUMBER_ReliabilityFactor,
	--MAX(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc) as MAX_RF,
	MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc) as MIN_RF
from Reporting.Vehicles
where ReliabilityFactor is not null and ReliabilityFactor <= 3.50
VehicleCodeReliabilityFactorROW_NUMBER_ReliabilityFactorMIN_RF
ARQ_VH1293.513.5
MJQ_VH1663.523.5
IBV_VH1673.4352500438690233.43525004386902
JMX_VH1493.4000000953674343.40000009536743
MJQ_VH0063.4000000953674353.40000009536743

PART 4: RANGES/PRECEDING/FOLLOWING

Select top 7
	   VehicleCode,
	   ReliabilityFactor,
	   -- A:
	   COUNT(*) OVER() as [COUNT_OVER],
	   -- B:
	   COUNT(*) OVER(ORDER BY ReliabilityFactor desc  ) as [COUNT_ORDER_BY_RF],
	   -- C:
	   COUNT(*) OVER(ORDER BY ReliabilityFactor desc RANGE
					 UNBOUNDED PRECEDING) as [COUNT_RF_RANGE],
	   -- D:
	   COUNT(*) OVER(ORDER BY ReliabilityFactor desc RANGE
					 BETWEEN UNBOUNDED PRECEDING
						   AND CURRENT ROW) as [COUNT_RF_RANGE_BETWEEN],
	   -- E:
	   COUNT(*) OVER(ORDER BY ReliabilityFactor desc ROWS
					 BETWEEN UNBOUNDED PRECEDING
						   AND CURRENT ROW) as [COUNT_RF_ROWS]

from Reporting.Vehicles
where ReliabilityFactor is not null and ReliabilityFactor <= 3.50
VehicleCodeReliabilityFactorCOUNT_OVERCOUNT_ORDER_BY_RFCOUNT_RF_RANGECOUNT_RF_RANGE_BETWEENCOUNT_RF_ROWS
ARQ_VH1293.530852221
MJQ_VH1663.530852222
IBV_VH1673.4352500438690230853333
JMX_VH1493.4000000953674330857774
MJQ_VH0063.4000000953674330857775
MJQ_VH1743.4000000953674330857776
MJQ_VH2043.4000000953674330857777

Here is a fairly exhaustive set of clauses...

Select top 7
	   VehicleCode,
	   ReliabilityFactor,
	   --A:
	   MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc) as MIN_RF,
	   --B:
			  MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc RANGE
					 UNBOUNDED PRECEDING) as MIN_RF,
	   --B1:
	   MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc RANGE
					 BETWEEN UNBOUNDED PRECEDING
						   AND CURRENT ROW) as MIN_RF,
	   --C:
	   MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc  ROWS
					 BETWEEN UNBOUNDED PRECEDING
						   AND UNBOUNDED FOLLOWING) as MIN_RF_UNBOUNDED,
	   --D:
	   MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc  RANGE
					 BETWEEN UNBOUNDED PRECEDING
						   AND UNBOUNDED FOLLOWING) as MIN_RF_RANGE_UNBOUNDED,
	   --E:
	   MIN(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc  RANGE
					 UNBOUNDED PRECEDING) as MIN_RF_RANGE_UNBOUNDED_PREC,
	   --F:
	   Max(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc
	   rows BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as Max_RF_rows_AHEAD,
	   --G:
	   Max(ReliabilityFactor) OVER(ORDER BY ReliabilityFactor desc
	   rows BETWEEN 2 PRECEDING AND 2 FOLLOWING) as Max_RF_rows_neighbours

from Reporting.Vehicles
where ReliabilityFactor is not null and ReliabilityFactor <= 3.50
order by ReliabilityFactor desc
VehicleCodeReliabilityFactorMIN_RFMIN_RF1MIN_RF2MIN_RF_UNBOUNDEDMIN_RF_RANGE_UNBOUNDEDMIN_RF_RANGE_UNBOUNDED_PRECMax_RF_rows_AHEADMax_RF_rows_neighbours
ARQ_VH1293.53.53.53.5003.53.53.5
MJQ_VH1663.53.53.53.5003.53.53.5
IBV_VH1673.435250043869023.435250043869023.435250043869023.43525004386902003.435250043869023.435250043869023.5
JMX_VH1493.400000095367433.400000095367433.400000095367433.40000009536743003.400000095367433.400000095367433.5
MJQ_VH0063.400000095367433.400000095367433.400000095367433.40000009536743003.400000095367433.400000095367433.43525004386902
MJQ_VH1743.400000095367433.400000095367433.400000095367433.40000009536743003.400000095367433.400000095367433.40000009536743
MJQ_VH2043.400000095367433.400000095367433.400000095367433.40000009536743003.400000095367433.400000095367433.40000009536743

PART 5: PARTITIONS

We will use COUNT to demonstrate partitions.

First, a COUNT with no partition...

The COUNT is the total count...

Select top 9
	   VehicleCode,
	   Fleet,
	   COUNT(*) OVER() as [COUNT]
from Reporting.Vehicles
where not Fleet is null
order by Fleet, Vehiclecode
VehicleCodeFleetCOUNT
PLF_VH112096FQ3898191
CHP_VH1510H3A82P28191
CHP_VH1590H3A82P28191
PEF_VH0070C1FT1H28191
BXR_VH1881Y0PY47X8191
BXR_VH1901Y0PY47X8191
BXR_VH1991Y0PY47X8191
BXR_VH2001Y0PY47X8191
BXR_VH2011Y0PY47X8191

Let's add some partition... This time count will show us how many rows in each Fleet:

Select top 9
	   VehicleCode,
	   Fleet,
	   ROW_NUMBER() OVER(PARTITION BY Fleet order by VehicleCode) as [ROW_IN_Fleet],
	   COUNT(*) OVER(PARTITION BY Fleet) as [COUNT_IN_Fleet],
	   -- Friendly Name...
	   'Vehicle ' + Cast(ROW_NUMBER() OVER(PARTITION BY Fleet order by VehicleCode) as Varchar(10))  + ' of ' + Cast(COUNT(*) OVER(PARTITION BY Fleet) as Varchar(10)) + ' on ' + Fleet as [Friendly_Name]
from Reporting.Vehicles
where not Fleet is null
order by Fleet, Vehiclecode

|VehicleCode|Fleet|ROW_IN_Fleet|COUNT_IN_Fleet|Friendly_Name| |------|------|------| |PLF_VH112|096FQ389|1|1|Vehicle 1 of 1 on 096FQ389| |CHP_VH151|0H3A82P2|1|2|Vehicle 1 of 2 on 0H3A82P2| |CHP_VH159|0H3A82P2|2|2|Vehicle 2 of 2 on 0H3A82P2| |PEF_VH007|0C1FT1H2|1|1|Vehicle 1 of 1 on 0C1FT1H2| |BXR_VH188|1Y0PY47X|1|5|Vehicle 1 of 5 on 1Y0PY47X| |BXR_VH190|1Y0PY47X|2|5|Vehicle 2 of 5 on 1Y0PY47X| |BXR_VH199|1Y0PY47X|3|5|Vehicle 3 of 5 on 1Y0PY47X| |BXR_VH200|1Y0PY47X|4|5|Vehicle 4 of 5 on 1Y0PY47X| |BXR_VH201|1Y0PY47X|5|5|Vehicle 5 of 5 on 1Y0PY47X|

PART 6: LEAD/LAG

Select top 9
	   VehicleCode,
	   COUNT(*) OVER(PARTITION BY Fleet) as [COUNT_IN_Fleet],
	   Fleet,
	   LEAD(Fleet) OVER(order by Fleet, VehicleCode) as [NEXT_Fleet],
	   LAG(Fleet) OVER(order by Fleet, VehicleCode) as [PREVIOUS_Fleet],
	   LEAD(Fleet,2) OVER(order by Fleet, VehicleCode) as [NEXT_NEXT_Fleet],
	   LAG(Fleet,2) OVER(order by Fleet, VehicleCode) as [PREVIOUS_PREVIOUS_Fleet],
	   LEAD(Fleet,8172) OVER(order by Fleet, VehicleCode) as [NEXT_Fleet_8172],
	   FIRST_VALUE(VehicleCode) OVER(partition by Fleet order by Vehiclecode) as First_Vehicle_in_Fleet,
	   LAST_VALUE(VehicleCode) OVER(partition by Fleet order by Vehiclecode
		   RANGE BETWEEN
		   CURRENT ROW AND UNBOUNDED FOLLOWING) as Last_Vehicle_in_Fleet
from Reporting.Vehicles
where not Fleet is null
order by Fleet, Vehiclecode
VehicleCodeCOUNT_IN_FleetFleetNEXT_FleetPREVIOUS_FleetNEXT_NEXT_FleetPREVIOUS_PREVIOUS_FleetNEXT_Fleet_8172First_Vehicle_in_FleetLast_Vehicle_in_Fleet
PLF_VH1121096FQ3890H3A82P20H3A82P2APYX07PLF_VH112PLF_VH112
CHP_VH15120H3A82P20H3A82P2096FQ3890C1FT1H2APYX07CHP_VH151CHP_VH159
CHP_VH15920H3A82P20C1FT1H20H3A82P21Y0PY47X096FQ389APYX07CHP_VH151CHP_VH159
PEF_VH00710C1FT1H21Y0PY47X0H3A82P21Y0PY47X0H3A82P2APYX07PEF_VH007PEF_VH007
BXR_VH18851Y0PY47X1Y0PY47X0C1FT1H21Y0PY47X0H3A82P2APYX07BXR_VH188BXR_VH201
BXR_VH19051Y0PY47X1Y0PY47X1Y0PY47X1Y0PY47X0C1FT1H2APYX07BXR_VH188BXR_VH201
BXR_VH19951Y0PY47X1Y0PY47X1Y0PY47X1Y0PY47X1Y0PY47XAPYX07BXR_VH188BXR_VH201
BXR_VH20051Y0PY47X1Y0PY47X1Y0PY47X101DY4731Y0PY47XARG447BXR_VH188BXR_VH201
BXR_VH20151Y0PY47X101DY4731Y0PY47X101DY4731Y0PY47XARG447BXR_VH188BXR_VH201

Background reading

See also