The scenario is thus - I have an amount in a table for various dates throughout the year, for a particular person.
NAME AMT DT ------ ---------- ----------- Scott 117 11-jan-2009 Scott 250 17-feb-2009 Scott 300 14-apr-2009 Scott 50 06-jun-2009 Wade 1231 17-mar-2009 Wade 2321 22-apr-2009 Wade 3122 30-sep-2009 Wade 59 31-oct-2009However in my report I wish all months of the year displayed for the person, regardless of whether amounts exist. Let's add a cumulative figure in there for good measure. Included below this is the SQL for my test case.
NAME MTH AMT CUM_AMT ------ --------- ---------- ---------- Scott January 117 117 Scott February 250 367 Scott March 0 367 Scott April 300 667 Scott May 0 667 Scott June 50 717 Scott July 0 717 Scott August 0 717 Scott September 0 717 Scott October 0 717 Scott November 0 717 Scott December 0 717 Wade January 0 0 Wade February 0 0 Wade March 1231 1231 Wade April 2321 3552 Wade May 0 3552 Wade June 0 3552 Wade July 0 3552 Wade August 0 3552 Wade September 3122 6674 Wade October 59 6733 Wade November 0 6733 Wade December 0 6733 DROP TABLE customer; CREATE TABLE customer ( name VARCHAR2(10) ,amt NUMBER ,dt DATE ); insert into customer values('Scott',117, to_date('11-01-2009','DD-MM-YYYY')); insert into customer values('Scott',250, to_date('17-02-2009','DD-MM-YYYY')); insert into customer values('Scott',300, to_date('14-04-2009','DD-MM-YYYY')); insert into customer values('Scott',50, to_date('06-06-2009','DD-MM-YYYY')); insert into customer values('Wade',1231, to_date('17-03-2009','DD-MM-YYYY')); insert into customer values('Wade',2321, to_date('22-04-2009','DD-MM-YYYY')); insert into customer values('Wade',3122, to_date('30-09-2009','DD-MM-YYYY')); insert into customer values('Wade',59, to_date('31-10-2009','DD-MM-YYYY'));This is a pre-10g solution, using an efficient method on the DUAL table to conjure records. I also use analytics to determine the cumulative amount. There is however a cartesion join and two full table scans on the customer table.
SELECT date_fill.name ,TO_CHAR(real_dt,'Month') mth ,NVL(amt,0) amt ,NVL(SUM(amt) OVER (PARTITION BY date_fill.name ORDER BY real_dt ),0) cum_amt FROM (SELECT name, TRUNC(dt,'mm') dt, SUM(amt) amt FROM customer GROUP BY name, TRUNC(dt,'mm') ) actual_data -- Actual data ,(SELECT name, real_dt FROM (SELECT DISTINCT name FROM customer) ,(WITH mths AS (SELECT TRUNC(SYSDATE,'YYYY') real_dt FROM DUAL CONNECT BY LEVEL <= 12) SELECT ADD_MONTHS(real_dt,ROWNUM-1) real_dt FROM mths) ) date_fill -- Distinct list with conjured dates -- Outer join actual data with full date list WHERE date_fill.real_dt = actual_data.dt(+) AND date_fill.name = actual_data.name(+) ORDER BY date_fill.name, date_fill.real_dt; ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 74 | 11 (28)| | 1 | WINDOW SORT | | 2 | 74 | 11 (28)| |* 2 | HASH JOIN OUTER | | 2 | 74 | 11 (28)| | 3 | VIEW | | 2 | 24 | 6 (17)| | 4 | MERGE JOIN CARTESIAN | | 2 | 24 | 6 (17)| | 5 | VIEW | | 1 | 6 | 2 (0)| | 6 | COUNT | | | | | | 7 | VIEW | | 1 | 6 | 2 (0)| | 8 | CONNECT BY WITHOUT FILTERING| | | | | | 9 | FAST DUAL | | 1 | | 2 (0)| | 10 | BUFFER SORT | | 2 | 12 | 6 (17)| | 11 | VIEW | | 2 | 12 | 4 (25)| | 12 | HASH UNIQUE | | 2 | 12 | 4 (25)| | 13 | TABLE ACCESS FULL | CUSTOMER | 8 | 48 | 3 (0)| | 14 | VIEW | | 8 | 200 | 4 (25)| | 15 | HASH GROUP BY | | 8 | 136 | 4 (25)| | 16 | TABLE ACCESS FULL | CUSTOMER | 8 | 136 | 3 (0)| -------------------------------------------------------------------------------------Here is what some would consider a more elegant solution using the model clause. The explain plan is far neater, and the performance was also enhanced - and there was no joins & half as many logical reads. The cumulative column was calculated using a nifty rule computation.
SELECT name, TO_CHAR(dt,'DD-MM-YYYY') dt, amt, cum_amt -- Model results FROM ( SELECT name, TRUNC(dt, 'MM') dt, SUM(amt) amt FROM customer GROUP BY name, TRUNC(dt, 'MM') ) MODEL PARTITION BY (name) DIMENSION BY (dt) MEASURES (amt, cast(NULL AS NUMBER) cum_amt) -- Define calculated col IGNORE NAV RULES SEQUENTIAL ORDER( -- Conjure dates amt[FOR dt FROM TO_DATE('01-01-2009', 'DD-MM-YYYY') TO TO_DATE('01-12-2009', 'DD-MM-YYYY') INCREMENT NUMTOYMINTERVAL(1, 'MONTH') ] = amt[CV(dt)] -- Apply amt for given date, if found ,cum_amt[ANY] = SUM(amt)[dt <= CV(dt)] -- Calculate cumulative ) ORDER BY name, dt; ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 8 | 136 | 5 (40)| | 1 | SORT ORDER BY | | 8 | 136 | 5 (40)| | 2 | SQL MODEL ORDERED | | 8 | 136 | 5 (40)| | 3 | HASH GROUP BY | | 8 | 136 | 5 (40)| | 4 | TABLE ACCESS FULL | CUSTOMER | 8 | 136 | 3 (0)| | 5 | WINDOW (IN SQL MODEL) SORT| | | | | ------------------------------------------------------------------------------The following solution is the one I stumbled upon while playing with ANSI joins. I wanted to see how it compared with the others. We still have a join, but that's not necessarily an issue. It's certainly tidier looking than the first solution, and we don't need to
SELECT c.name, TO_CHAR(real_dt,'Month') mth, NVL(amt,0) amt ,NVL(SUM(amt) OVER (PARTITION BY c.name ORDER BY real_dt ),0) cum_amt FROM customer c PARTITION BY (name) RIGHT OUTER JOIN (WITH mths AS (SELECT TRUNC(SYSDATE,'YYYY') real_dt FROM DUAL CONNECT BY LEVEL <= 12) SELECT ADD_MONTHS(real_dt,ROWNUM-1) real_dt FROM mths) mths ON (real_dt = TRUNC(c.dt,'mm')) ORDER BY c.name, real_dt; ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 130 | 8 (13)| | 1 | WINDOW SORT | | 2 | 130 | 8 (13)| | 2 | VIEW | | 2 | 130 | 8 (13)| | 3 | NESTED LOOPS PARTITION OUTER | | 2 | 46 | 8 (13)| | 4 | BUFFER SORT | | | | | | 5 | VIEW | | 1 | 6 | 2 (0)| | 6 | COUNT | | | | | | 7 | VIEW | | 1 | 6 | 2 (0)| | 8 | CONNECT BY WITHOUT FILTERING| | | | | | 9 | FAST DUAL | | 1 | | 2 (0)| |* 10 | FILTER | | | | | | 11 | SORT PARTITION JOIN | | 1 | 17 | 4 (25)| | 12 | TABLE ACCESS FULL | CUSTOMER | 1 | 17 | 3 (0)| -------------------------------------------------------------------------------------I executed these 3 statements 1000 times on my laptop on 10gR2 (similar results in 11gR1) to see how throughput compares. I consistently found the following performance:
6 secs -- Original .006 secs per iteration 4.6 secs -- Model clause .0046 secs per iteration .35 secs -- Partitioned outer join .00035 secs per iterationPersonally I was a little surprised at the dramatic timing difference with the partitioned outer join over the original. I thought perhaps that would have been more comparable to the model clause solution. Something to bear in mind in future development.
If I needed to write this for a future project I'd probably take a closer look at the relevant data set using Tom Kyte's runstats package. This used to be located here - htpp://asktom.oracle.com/tkyte/runstats.html, but the link is currently broken. Possibly related to his recent move. However here is one of many of his forum entries addressing runstats. This package is better than just measuring throughput, it will let you know how your application may scale by paying attention to latches. A DBA's perspective can be found here.
Learned something from this post. Thanks for sharing it.
ReplyDeleteregards