Friday, November 02, 2012

Index Scan

Generally, when query optimizer chooses index scan, it thinks that is a most efficient way to execute the query based on its understanding on the context when the query is executed. It can be because of how the query is written, how the indexes are built on the underlying tables and how the statistics have been updated. 

A non-selective queries likely to have index scan or table scan. If that is the nature of the query, index or table scan is inevitable, and sometimes, if the table is small, that is actually a more efficient way to go. For many other situations, for the unwanted index or table scan, there are means to change the query, change the indexes on the tables to turn them to index seek, which in most of cases are more efficient than index or table scan.

A query can not seek on an index if the query does not filter on index's left most indexed column(index key). This is because SQL Server only maintains the key distribution for an index's left most column. Many times, even the index key appears in the criteria, but it is used in a function or it is implicitly converted to another data type, the optimizer ends up index or table scan because the operation makes it non-deterministic.

Note: The order of columns in where clause is irrelevant.

Let's do some experiments to discover some of the situations that will end up with index scan. Suggestions on how to make them more efficient are also given.

Preparation

Let's create a table with 3 columns, prepare some data, and build an index on first two columns.

create table test(id int identity,a int, b int, c int)
go
insert into test(a,b,c)
select ROW_NUMBER() over (order by a.object_id),
RANK() over (order by a.object_id),
DENSE_RANK() over (order by a.object_id)
from sys.columns a cross join sys.columns b
go
create index ix_a_b on test(a,b)
go

Non-clustered Index scan

1. Fields in criteria and select list are all covered by underlying non-clustered index, but the operator is open or result is most of the table content.

    select a,b from test where a not in (100,101)
    -- this can be tuned by specifying a <100 or a>101
    select a,b from test
    -- this is not likely to be further tuned.

2. Fields in criteria and select list are all covered by underlying non-clustered index, but there's a function call or operation on the column used in the criteria.

    select a,b from test where a+1 =1000 -- this can be tuned by using a=999
    select a,b from test where abs(a) =1000 -- this can be tuned by using  a =1000 or a=-1000
----glad to see the implicit conversion between varchar and nvarchar no longer ends up with index or table scan in SQL Server 2012.

3. Criteria contains first indexed column, the operator is enclosing, but the column is operated by a function or operation.
    select a,b,c from test where abs(a) =1000 -- this can be tuned by using  a =1000 or a=-1000

Clustered Index scan

Recreate the table. Intention here is to make our experiment as simple as possible.

drop table test
go
create table test(id int identity,a int, b int, c int)
go
insert into test(a,b,c)
select ROW_NUMBER() over (order by a.object_id),
RANK() over (order by a.object_id),
DENSE_RANK() over (order by a.object_id)
from sys.columns a cross join sys.columns b
go
create clustered index ix_a on test(a)


1. For a query to select all rows from a table, if the table has a clustered index, it will end up with clustered index scan, which is essentially a table scan.

    select a,b,c from test

2.  Fields in criteria and select list are all covered by underlying non-clustered index, but the operator is open or result is most of the table content.

    select a from test where a not in (999,1000 )
--optimizer is smart enough to use index seek if there is only one value in the parentheses.

3. Fields in select list are not covered by any indexes created on the table and column used as predicate is not straight. This becomes table scan/clustered index scan.

     select a,b,c from test where a+1 =1000

4. Criteria contains only non-indexed columns, optimizer has no good predicate to use. it ends up table scan/clustered index scan.

     select * from test where c =1000

The situation that will cause index scan is very complex and may change over the time when statistics on column and indexes are not accurately updated. Please do not take the situations here as granted, you will have to check the execution plan to understand why it ends up with certain operations in your very specific execution context.

No comments: