What if you would like a smoother version of that line - a moving average, if you will (red line). This stabilises the results, like looking at climate vs weather.
Oracle APEX Line Chart - 2 series |
SELECT null lnk ,diff label ,count(*) AS qty ,round(avg(count(*)) over (order by diff rows between 3 preceding and 3 following) ,2) AS moving_avg ...
The third column uses an analytical function to calculate the average of the 6 surrounding counts at any given point on the x-axis. The "rows between" syntax is often left to the default, which is everything up to the current value (see post comments), based on the order provided. In this case it explicitly specifies the 'window' of rows to average to be only nearby data points.
Note this report looks a lot cleaner with display setting 'Marker' using 'None', as opposed to 'Circle'.
Nice, simple way of providing the user information that can be easier to read & interpret.
Thanks for promoting analytic functions, Scott, always great to see :-)
ReplyDeleteI have to add a small nitpick, though. You state:
The "rows between" syntax is often left to the default, which is everything up to the current row, based on the order provided.
That is not quite correct - it would be more correct to say:
The "rows between" syntax is often left to the default, which is everything up to the current VALUE, based on the order provided.
Because the default, when you have an ORDER BY clause and omit the windowing clause, is: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
And in my opinion that is the "wrong" default, because 90% of the time what you really want is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
The difference is what happens when there are rows with the same value (ties) in the ORDER BY column. The RANGE default includes in the window all rows with the same value or less, which might include one or more following rows in case of ties.
If you leave it at default RANGE BETWEEN ..., you risk not only results that may not be what you want (in case of ties), you also risk superfluous sorting operations. I've blogged about that at http://www.kibeha.dk/2013/02/rows-versus-default-range-in-analytic.html.
Sorry about if it sounds like a rant - it is just a pet peeve with me and I grab every opportunity to try and tell developers of analytic functions, that my rule of thumb for analytics with ORDER BY clause is never to rely on default, but explicitly write ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (typically >90% of the time) or RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (typically <10% of the time.)
That way you have also self-documented your code. If you leave it to default, 3 months after writing the code you'll be in doubt whether this was one of the few cases where you actually meant RANGE or if it was a mistake and you actually meant ROWS.
Thanks for listening :-)
/Kim
That's great feedback, Kim. I think Connor's picked me up on these semantics before, and it reminds me of the UNION ALL 'default', just a little ... deeper. Great post.
ReplyDelete