Healthcare Analytics

Use ROW_NUMBER() and not Min/Max functions

Written by Jean-Luc Coquerel | Sep 12, 2013 9:43:00 PM

For those of you who are SQL experts, this might sound like a trivial suggestion, but if you are not already familiar with the SQL function ROW_NUMBER(), keep reading. In fact look it up.

The syntax is ROW_NUMBER() OVER(PARTITION BY [name of field you want to group by] ORDER BY [name of field you want to sort by]).

This function assigns a number to each record returned in a query based on a grouping (or partition) and sort order that you specify. So, if you want the first or last record of something (first order date, last encounter, latest documented flowsheet record, etc...), this function will make your code much faster and accurate than trying to get to the same answer with the combination of Min and Max.

Here is why: min and max are aggregate functions (just like a sum and count). They operate implicitly at the level of grouping where they are applied. Using them can easily result in some convoluted queries (worse: inaccurate in some instances). Also these aggregate functions tend to be more taxing on performance. Row_Number() on the other hand is not an aggregate function. It simply assigns a rank to records and works much faster. The only caveat is that you have to first assign the ranking before you can use it. So, it is leveraged primarily with a subquery.