Monday, September 10, 2012

Which index to use,if one covers another?


This is from an interview question I received.
A table has column A with two indexes.
indexA(column A) 
indexB(column A + multiple columns)

I think this will depend on how the query is written and how other indexes are defined on the table. to verify that, I did experiments.

0. Preparation

create table test(id int identity,a int, b int, c int)
go
insert into test(a,b,c)
select ROW_NUMBER() over (order by a.object_id),RANK() over (order by a.object_id),DENSE_RANK() over (order by a.object_id)
from sys.columns a cross join sys.columns b
go
create index ix_a on test(a)

create index ix_a_b on test(a,b)
go
dbcc show_statistics (test,ix_a)
dbcc show_statistics (test,ix_a_b)
go

Then check the execution plan of queries under different situations.

1. select * from test
This ends up as table scan as expected.
With clustered PK: clustered index scan because table is now the index.

2. select * from test where id<100
Since no index is created on id yet, so table scan is expected. and it is
With clustered PK: clustered index seek instead of table scnan.

3.select * from test where a<100
Since a is in both where and select list, index seek on ix_a_b is expected. it chooses wider index because the covering column b is in the select list.
With PK: same ,  but of cause now it has a key lookup instead of rid look up
4. select a from test where a<100
    select a from test where a=100
Here has an interesting observation. In this test, the optimizer chooses ix_a_b over ix_a.
With PK: same

In my another testing, when the table is very wide and no other indexes, it chose ix_a over ix_a_b.
But when a clustered PK was created, the optimizer chose ix_a_b over ix_a.

4.1 Update the statistics and recheck the plan.
update statistics test
it still chooses the ix_a_b

5 Comparison when cost is the same

5.1 with covering index, it chooses wider index.
select a from test where a=100
select a from test with(index(ix_a)) where a=100
it chooses ix_a_b over ix_a

5.2 with uncovering index, in chooses which ever created the first.
select a,c from test where a=100
(select a,c from test with(index(ix_a)) where a=100 to check the cost is the same)
drop index test.ix_a
drop index test.ix_a_b
--change the creation order
create index ix_a_b on test(a,b)
create index ix_a on test(a)
select a,c from test where a=100

6. select a,b from test where a=100
As expected, it chooses ix_a_b since this provides covering on select list.
 With PK: same

Next, Let's experiment how the indexes are used in joining.

7.select a.a from test a join test b on a.a=b.a
The optimizer is smart enough to choose ix_a for both tables

With PK: same

8.select a.a,a.b from test a join test b on a.a=b.a
The optimizer is smart enough to choose ix_a_b for a and ix_a for b.
 With PK: same

9 select a.a,a.b,a.c from test a join test b on a.a=b.a
This time, table scan for a, ix_a for b. As Expected.

With PK:
table scan becomes index scan
Next, let's take a look at how other indexes effect the optimizer's decisions.

10. build PK on id
Alter table test add constraint pk_test_id primary key clustered (id)

The go back to check the execution plans for the situations having been discussed.

Conclusions:



The optimizer choose indexes based on how it caclucate the cost. it calculate the cost
depending on how the query is written, how wide the table is, how wide the index is and
what other indexes/key are defined on the table. Basically, it depends on how the cost is
calculated and a cost is from several aspects such as disk IO and CPU usage.

In a wide table with huge number of records, if these are the only two indexes available, for
the query like “select a from test where a=100”, the optimizer will use ix_a. For a query like
“select a,other columns in ix_a_b from test where a=100”, the optimizer will use ix_a_b since
it provides more cover of the columns in the select list.

In a narrow table, if the cost is similiar, the optimizer chooses wider index over narrower index.
E.G. choose ix_a_b over ix_a on “select a from test where a=100”.

No comments: