Pages

Wednesday, 8 August 2012

Fun with SQL analytical functions

I had an interesting SQL problem at work recently, and I came up with a solution that I'm not sure is completely ideal - so I thought I'd attempt to replicate it here.

Some of the complexity is lost while I obscure and simplify the problem (it involved a hierarchical query), but I think the key elements remain.

Consider a table of codes with an order sequence.
create table my_values (code varchar2(10), order_seq number(5));
insert into my_values values ('A', 10);
insert into my_values values ('A', 20);
insert into my_values values ('A', 30);
insert into my_values values ('B', 40);
insert into my_values values ('B', 50);
insert into my_values values ('C', 60);
insert into my_values values ('C', 70);
insert into my_values values ('C', 80);
insert into my_values values ('D', 90);
My requirement was to see the next and previous code. For instance, when listing B records, I wanted to see A and C in the same row - the solution screamed analytical functions so I started with my favourite:
select code, order_seq
      ,row_number() over (partition by code order by order_seq) rn
from my_values
order by order_seq;

CODE       ORDER_SEQ RN
---------- --------- --
A                 10  1 
A                 20  2 
A                 30  3 
B                 40  1 
B                 50  2 
C                 60  1 
C                 70  2 
C                 80  3 
D                 90  1
I struck the results down to just the first row for each code, incorporating lag/lead to get the info I needed.
select s.code
      ,lag(s.code) over (order by s.order_seq) my_lag
      ,lead(s.code) over (order by s.order_seq) my_lead
from 
(
  select code, order_seq
        ,row_number() over (partition by code 
                            order by order_seq) rn
  from my_values
  order by order_seq
) s
where s.rn = 1
order by order_seq;

CODE       MY_LAG     MY_LEAD  
---------- ---------- ----------
A                     B          
B          A          C          
C          B          D          
D          C                    
Finally, to combine my results I created an in-line view with a subquery factoring statement.
with sub as (
  select s.code
        ,lag(s.code) over (order by s.order_seq) my_lag
        ,lead(s.code) over (order by s.order_seq) my_lead
  from 
  (
    select code, order_seq
          ,row_number() over (partition by code 
                              order by order_seq) rn
    from my_values
    order by order_seq
  ) s
  where s.rn = 1)
select m.*, sub.my_lag, sub.my_lead
from my_values m, sub
where m.code = sub.code;

CODE       ORDER_SEQ MY_LAG     MY_LEAD  
---------- --------- ---------- ----------
A                 10            B          
A                 20            B          
A                 30            B          
B                 40 A          C          
B                 50 A          C          
C                 60 B          D          
C                 70 B          D          
C                 80 B          D          
D                 90 C                     
So my question is - any ideas for a simpler solution?

2 comments:

  1. Not really simpler, just another way

    select
    code
    ,order_seq
    ,last_value(preceding ignore nulls) over(order by order_seq) preceding
    ,last_value(following ignore nulls) over(order by order_seq desc) following
    from (
    select t.*
    , nullif(lead(code) over(order by order_seq),code) following
    , nullif(lag(code) over(order by order_seq),code) preceding
    from my_values t
    ) t
    order by order_seq

    Best regards

    Maxim

    ReplyDelete
  2. The Maxims solution may be improved to do only two window sorts:

    select
    code
    ,order_seq
    ,last_value(preceding ignore nulls) over(order by order_seq) preceding
    ,first_value(following ignore nulls) over(order by order_seq rows between current row and unbounded following) following
    from (
    select t.*
    , nullif(lead(code) over(order by order_seq),code) following
    , nullif(lag(code) over(order by order_seq),code) preceding
    from my_values t
    ) t
    order by order_seq
    ;

    And with a dirty interpretation of the test data it is possible with only one window sort:

    select t.*
    , nullif(first_value(code) over(order by ascii(code) range 1 preceding),code) pr
    , nullif(last_value(code) over(order by ascii(code) range between current row and 1 following),code) fo
    from my_values t
    ;

    ReplyDelete