Friday, 5 February 2010

Please check your basic syntax and expected behaviour.

Today a colleague of mine found some indexes on a system he was working on that didn't look quite right. He checked out the definition and found something a little surprising!

create index my_index on my_table ('my_column');

The indexes had been defined incorrectly, and hence have never been utilised. For those minds that aren't working on Fridays, the index column should not be defined in quotes.

This issue can be simply demonstrated with a small test. Below I create a table, populated it with a bunch of rows, create this bad version of the index and see if it's utilised.

I've trimmed some of the output for brevity, but before my two select statements, I utilised the SQL*Plus command:
SET AUTOTRACE ON EXPLAIN

10g>  drop table chk_syntax;

Table dropped.

10g>  create table chk_syntax
  2    (col1 number(18)
  3    ,col2 varchar2(200));

Table created.

10g>
10g>  insert into chk_syntax
  2  select rownum, 'col'||rownum
  3  from dual
  4  connect by level <= 10000;

10000 rows created.

10g>
10g>  create index bad_syntax_i on chk_syntax ('col2');

Index created.

10g>
10g>  select * from chk_syntax where col2 = 'col2';

      COL1 COL2
---------- ----------------------------------------------------------------------
         2 col2

1 row selected.

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |   115 |     8   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CHK_SYNTAX |     1 |   115 |     8   (0)| 00:00:01 |
--------------------------------------------------------------------------------

10g>
10g>  create index good_syntax_i on chk_syntax (col2);

Index created.

10g>
10g>  select * from chk_syntax where col2 = 'col2';

      COL1 COL2
---------- ----------------------------------------------------------------------
         2 col2

1 row selected.

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |   115 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHK_SYNTAX    |     1 |   115 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | GOOD_SYNTAX_I |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COL2"='col2')

10g>
10g>  select * from user_ind_columns where table_name = 'CHK_SYNTAX';

INDEX_NAME                     TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------ -------------------
BAD_SYNTAX_I                   CHK_SYNTAX                     SYS_NC00003$
GOOD_SYNTAX_I                  CHK_SYNTAX                     COL2

2 rows selected.

10g>
10g>  select * from user_ind_expressions where table_name = 'CHK_SYNTAX';

INDEX_NAME                     TABLE_NAME                     COLUMN_EXPRESSION
------------------------------ ------------------------------ -------------------
BAD_SYNTAX_I                   CHK_SYNTAX                     'col2'

1 row selected.

The bad index isn't utilised, but the good one is referenced by the optimiser.

Second to this, a look at the data dictionary view user_ind_columns shows the bad index doesn't refer to an actual column - only a generated one.

This is true of function-based indexes, which is what this bad index has done. When I looked in user_ind_expressions I see the index defined as a literal string.

This is where you normally see function-based indexes such as:
CREATED INDEX surname_fn_i ON person (UPPER(surname));

This also reminds me a common open question I ask during training sessions.

What does this return:
SELECT '6 * 6' as "Area" FROM dual;

Often when writing queries, insert statements, data definition - syntax can be valid and safely executed... but it doesn't make it right.

Please check your syntax, and confirm expected behaviour.

10g> SELECT 6 * 6 as "Area" FROM dual;

Area
----------
36

1 row selected.

3 comments:

Jeffrey Kemp said...

Wow, I would have thought Oracle would disallow indexes on literal values in the first place - the only possible, theoretical use for such an index that I can think of is that if the index were compressed, it might provide the fastest way to count all the rows in the table - will have to test that and see.

Jeffrey Kemp said...
This comment has been removed by the author.
Jeffrey Kemp said...

Well, that's interesting. If you often need to get a full count of a large table, having an index like this could give appreciable performance gains, although there's not much difference than using a unique index on a not-null column (tested on Oracle 9i):

drop table mybigtable;
create table mybigtable as
select rownum as ID,
RPAD('x',4000,'x') as PADDING
from dual connect by level <= 100000;
alter table mybigtable
modify (ID not null);
create unique index myidindex
on mybigtable (ID);
create index mysmallindex
on mybigtable ('x') compress;
begin
dbms_stats.gather_table_stats(USER,'MYBIGTABLE',cascade=>TRUE);
end;

--FTS = 26 seconds
select /*+FULL(mybigtable)*/ count(*) from mybigtable;

--FFS, index on id = 31 msecs
select /*+INDEX_FFS(mybigtable myidindex)*/ count(*) from mybigtable;

--FFS, index on 'x' = 31 msecs
select /*+INDEX_FFS(mybigtable mysmallindex)*/ count(*) from mybigtable;