Friday, 23 June 2017

Removing Outliers using stddev()

A colleague asked me about removing outliers from reports and I knew just the function to give a go.

Some time ago I had to prove that some data we had conformed to a normal distribution. Remember those from high school?
A normal distrubition (bell curve)
That request was about 2007 and I ended up using stddev() and lpad() to produce a vertical shaped bell curve in SQL*Plus. That was pretty cool.

As for removing outliers, we can use this tool to exclude a small percentage of whatever data you have, so that it doesn't skew averages.

If you want to look at the code in context, I've created a test case on livesql.oracle.com

But as an inline reference, first I created a simple table, and populated it with a small set of random data between known range.
(I would like to add this is exactly what Tim Hall recently described here in regard to learning and asking questions.)

Importantly, I also inserted an outlier that was an order of magnitude higher than the rest.
My aim was to ignore this outlier in my SQL that calculated average.
create table outlier_demo  (a number, val number);

insert into outlier_demo  
select rownum -- we don't care  
  , round(dbms_random.value(20,50),1) -- some random numbers between 20 and 50  
from dual   
-- let's create 49 rows  
connect by level < 50

insert into outlier_demo values (0,500)

select * from outlier_demo;

A       VAL
---     ----
1       27.9
2       42.5
3       49.7
4       22.3
5       42.3
6       20.2
7       33.9
8       46.8
...
48      45.7
49      21.8
0       500
The average is higher than expected because of that outlier. I've included a column that calculates the average without the outlier, albiet hardcoded. Notice how close it is to the median? When considering usage statistics from APEX, I've often paid more attention to the median (most rows) than the average (affected by outliers).
select count(*) c   
   -- what's our average?   
  ,round(avg(val)) the_avg   
   -- what would the average be without the known outlier?   
  ,round(avg(case when val < 500 then val end)) no_outlier   
  ,round(stddev(val)) std   
   -- what's the median  
  ,round(median(val)) the_median  
from outlier_demo 

C    THE_AVG NO_OUTLIER    STD   THE_MEDIAN
---  ------- ----------  -----   ----------
50        46         37     66           36
We can take the same function and get the result for every row, that's one way we can use it to filter outlier data.
select a, val  
   -- what is 2 standard deviations away  
   -- provide result for each row  
   ,round(2*stddev(val) over (order by null)) local_std  
from outlier_demo
Re-written as a from clause, we can now filter any rows that have a value higher than x standard deviations from the norm.
select count(*) c   
  ,round(avg(val)) the_avg   
  ,round(avg(case when val < 500 then val end)) no_outlier   
  ,round(stddev(val)) new_stddev  
  ,round(avg(local_std)) filter  
from    
  (select a, val   
   -- what is 2 standard deviations away   
   -- provide result for each row   
   ,round(2*stddev(val) over (order by null)) local_std   
   from outlier_demo   
  )   
-- only average those values within two standard deviations   
where val < local_std;

C   THE_AVG NO_OUTLIER  NEW_STDDEV  FILTER
--  ------- ----------  ----------  ------
49       37        37           9      132
This could be re-written in a form that may be more recognisable to some.
select count(*) c
  ,avg(val) the_avg
  ,avg(case when val < 500 then val end) no_outlier
  ,stddev(val) std
from outlier_demo
where val < (select 2*stddev(val) from outlier_demo);
There are probably a number of variations on how to apply this solution, and possibly a few alternative methods.
It will probably depend on the SQL in your particular report.

This is not to say you should be ignoring outliers completely, but stddev() can help identify them where necessary. And it scales.

Thank Kate for this post.

ps - If you want to dive deeper, Jonathan Gennick (my patient book editor) has two in depth posts on the topic, with some diagrams to help understand the mathematical concepts.

Related post:
Filtering outliers from APEX workspace activity logs

Wednesday, 21 June 2017

Annual Oracle release cycle coming?

Those of you on Twitter at the end of May, and certainly those at DOAG might have seen this announcement.

I'm no DBA, but that first sentence in the picture has the potential to be rather game changing.
Annual Feature Release of Oracle Database Software
More information is to come (from official channels), as so eloquently put in a comment on this only other post I've encountered on the topic, but I've been pondering this glimpse of change with some colleagues, here in the suburbs of our isolated Australian city.

We're navigating our way through a few upgrades now, and I've seen the same patterns before. Without a clear idea on event what month a major release may arrive, it can be difficult to engineer time into the schedule for an upgrade. It's like launching rockets, miss a window and you may have to wait months for another opportunity.

Other software vendors have set release schedules, and it an annual release cycle will no doubt influence the operations of Oracle consumers (and conferences) around the world, I think for the better.

And they probably won't need to worry about how to market the superstitiously tainted #13.

Wednesday, 14 June 2017

Presentation: Dynamic Actions, Javascript, & CSS for APEX Beginners

My first webinar didn't do so well, perhaps my new home will have suitable internet capability, in this "innovate nation" of ours...

My second one on my Boggex game went ok, though the room the locals were in was like an sauna.

My third attempt had it's own set of issues.
1) Ubuntu as an OS doesn't seem to enable broadcasting via GoTo Meeting.
2) The wireless policy in the Oracle building is effective, but not great for hastily setting up a second laptop
3) The navigation keys on Lino's keyboard was just different enough to be disruptive
4) While I use on Chrome's save password feature, I'm pretty confident with most of my APEX passwords. It's the username I was stuffing up on the foreign PC.
5) The think tank at HP thought it would be a good idea to map Aeroplane Mode to F12, so when I proceeded to display the Chrome browser console...

So balance was restored after the first 10 or 15 minutes, which rattled my first attempt at presenting using just an APEX app - no powerpoint/prezi.

Though I think the concept was fine. It's just like training people, jumping between Page Designer and runtime.

I'd give it another go, and it's less presentation prep. I learned a bit from the first session, and explored some layout options while I was at it.

I still baulk a little at the difference between talking with a live crowd, and a community of ears at the other end of my microphone. I wonder if (finally) creating a few more videos will fix that? Once my ultrabook is back from computer hospital, perhaps. In the new house?

As for the topic, I explored dynamic actions, JavaScript and CSS for those trying to understand the fundamentals. It's a favourite topic of mine and I think there are a few basic patterns that get me a long way in creating interactive applications.

No doubt I'll refine these example further, and how to communicate them more concisely. Stay tuned.

I've made the app available here. Please feel free to provide any feedback, either here or through the app.
https://apex.oracle.com/pls/apex/f?p=27882
My favourite page is P50 specifically on Dynamic Actions, probably the most versatile utility within APEX.
If you scroll down you'll see the relevant code I used to sustain an interactive page that doesn't need to submit.
Also look out for things like conditional buttons, conditional colours (foreground and background), APEX behaviour considerations, among others.

If you're in Australia and you'd like training on this topic, I could demonstrate cool stuff for hours ;p
I also aim to get some organisation around github sorted when I get my laptop back, but I've at least put the export here so you can check it out directly.
https://github.com/swesley/apex-beginner-css-jquery

I hope you learn something new.

Scott

I'm still here.

It has been some time since my last confession.
It's been a while, but sometimes you get busy, ya know?

And a different sort of busy, the sort that left me wanting to escape to a different place. I spent a bit of time playing an old favourite, Civilisation.

I really like to write, consolidate ideas, and share insights in this forum, but it's rare that I have such a gap.

I also miss it.

I always have plenty of unfinished drafts, but I also usually have a few recent experiences that are like an itch waiting to come out.

There are also a few other factors. My laptop is just within warranty and I'm finally getting a line of pink pixels fixed. I'm also gonna wipe Ubuntu off it and go back to Windows.

I feel my OS sojourn was successful in that I realised in a career of learning such as development, some of us need to learn to swim in a pool, not the ocean. Not the best analogy, but close enough.

My workplace is always (usually) Windows, any exploration on my laptop can be within a VM from the comfort of Windows. I've also got accumulated decades worth of keyboard shortcuts, and the muscle memory alone keeps me to the original style keyboard, not the weird ones that Macs use.

And it turns out we're also gearing up to move house, well, start by putting our place on the market. That sounds like fun, right?

You'll find me escaping occasionally, back here, writing with passion about what I find to be an interesting career. That, and reading/listening to stuff about science. There is some amazing stuff going on in this world!