Pages

Friday, 8 December 2017

Friday Fun SQL Lesson - union all

Our office kitchen is unavailable this Friday, so the call was put out for pub lunch.

After a couple of replies I decided to enter my reply, in full nerd mode.
select * from people_coming_to_lunch;

People
--------
Kate
Scott
Karolina

3 rows selected.
And of course one of the other SQL geeks (name redacted) replied to extend the data set.
select * from people_coming_to_lunch
union 
select 'Shanequa'
from dual;
And I couldn't help myself. I had to play the performance card and suggest that UNION ALL would be the appropriate usage, and should be the default you type out. Always. Until you decide otherwise.

That's because sorts are expensive. And a UNION will need sorting to check for duplicates.

That sort of all the rows isn't necessary
And it will sort the data set even if there is a unique key on the data.
create table people_coming_to_lunch (people varchar2(30));
insert into people_coming_to_lunch values ('Scott');
insert into people_coming_to_lunch values ('Kate');
insert into people_coming_to_lunch values ('Karolina');

create unique index lunch_people on people_coming_to_lunch(people);

select * from people_coming_to_lunch
union all
select 'Shanequa' from dual
By using UNION ALL instead of UNION, you're telling the database not to even bother sorting the set to eliminate any potential duplicates, since your advanced human brain knows there will be no duplicates.

With only a few rows, the timing of sheer throughput is barely noticable.
iterations:1000
     0.30 secs (.0003 secs per iteration)  -- UNION
     0.25 secs (.00025 secs per iteration) -- UNION ALL
 
iterations:10000
     1.72 secs (.000172 secs per iteration)
     1.09 secs (.000109 secs per iteration)
 
iterations:50000
    10.94 secs (.0002188 secs per iteration)
     8.48 secs (.0001696 secs per iteration)
So I turned it up a notch and added about 5000 rows to the table.
insert into people_coming_to_lunch
select table_name from all_tables;
 
5000 rows inserted
Here's the explain plan without the sort.

That's one less chunk of 5000 rows to process

Now the differences in performance stand out.
iterations:1000
     6.79 secs (.00679 secs per iteration) -- UNION
     2.85 secs (.00285 secs per iteration) -- UNION ALL
 
iterations:5000
    42.91 secs (.008582 secs per iteration)
    19.89 secs (.003978 secs per iteration)
 
iterations:5000
    31.70 secs (.00634 secs per iteration)
    22.83 secs (.004566 secs per iteration)
 
iterations:5000
    30.75 secs (.00615 secs per iteration)
    16.76 secs (.003352 secs per iteration)
Upto twice as long for the same statement?
No thanks, not when I could just type 4 extra characters to get an easy performance win.

Turns out this topic formed my first technical post. Back in 2009, after almost 10 years of using SQL, that was the first thing I blogged about. How about that.

No comments:

Post a Comment