Monday, September 10, 2012

Which index to use,if one covers another?


This is from an interview question I received.
A table has column A with two indexes.
indexA(column A) 
indexB(column A + multiple columns)

I think this will depend on how the query is written and how other indexes are defined on the table. to verify that, I did experiments.

0. Preparation

create table test(id int identity,a int, b int, c int)
go
insert into test(a,b,c)
select ROW_NUMBER() over (order by a.object_id),RANK() over (order by a.object_id),DENSE_RANK() over (order by a.object_id)
from sys.columns a cross join sys.columns b
go
create index ix_a on test(a)

create index ix_a_b on test(a,b)
go
dbcc show_statistics (test,ix_a)
dbcc show_statistics (test,ix_a_b)
go

Then check the execution plan of queries under different situations.

1. select * from test
This ends up as table scan as expected.
With clustered PK: clustered index scan because table is now the index.

2. select * from test where id<100
Since no index is created on id yet, so table scan is expected. and it is
With clustered PK: clustered index seek instead of table scnan.

3.select * from test where a<100
Since a is in both where and select list, index seek on ix_a_b is expected. it chooses wider index because the covering column b is in the select list.
With PK: same ,  but of cause now it has a key lookup instead of rid look up
4. select a from test where a<100
    select a from test where a=100
Here has an interesting observation. In this test, the optimizer chooses ix_a_b over ix_a.
With PK: same

In my another testing, when the table is very wide and no other indexes, it chose ix_a over ix_a_b.
But when a clustered PK was created, the optimizer chose ix_a_b over ix_a.

4.1 Update the statistics and recheck the plan.
update statistics test
it still chooses the ix_a_b

5 Comparison when cost is the same

5.1 with covering index, it chooses wider index.
select a from test where a=100
select a from test with(index(ix_a)) where a=100
it chooses ix_a_b over ix_a

5.2 with uncovering index, in chooses which ever created the first.
select a,c from test where a=100
(select a,c from test with(index(ix_a)) where a=100 to check the cost is the same)
drop index test.ix_a
drop index test.ix_a_b
--change the creation order
create index ix_a_b on test(a,b)
create index ix_a on test(a)
select a,c from test where a=100

6. select a,b from test where a=100
As expected, it chooses ix_a_b since this provides covering on select list.
 With PK: same

Next, Let's experiment how the indexes are used in joining.

7.select a.a from test a join test b on a.a=b.a
The optimizer is smart enough to choose ix_a for both tables

With PK: same

8.select a.a,a.b from test a join test b on a.a=b.a
The optimizer is smart enough to choose ix_a_b for a and ix_a for b.
 With PK: same

9 select a.a,a.b,a.c from test a join test b on a.a=b.a
This time, table scan for a, ix_a for b. As Expected.

With PK:
table scan becomes index scan
Next, let's take a look at how other indexes effect the optimizer's decisions.

10. build PK on id
Alter table test add constraint pk_test_id primary key clustered (id)

The go back to check the execution plans for the situations having been discussed.

Conclusions:



The optimizer choose indexes based on how it caclucate the cost. it calculate the cost
depending on how the query is written, how wide the table is, how wide the index is and
what other indexes/key are defined on the table. Basically, it depends on how the cost is
calculated and a cost is from several aspects such as disk IO and CPU usage.

In a wide table with huge number of records, if these are the only two indexes available, for
the query like “select a from test where a=100”, the optimizer will use ix_a. For a query like
“select a,other columns in ix_a_b from test where a=100”, the optimizer will use ix_a_b since
it provides more cover of the columns in the select list.

In a narrow table, if the cost is similiar, the optimizer chooses wider index over narrower index.
E.G. choose ix_a_b over ix_a on “select a from test where a=100”.

Tuesday, September 04, 2012

Table Valued Function and Inline Table Valued Function

Inline user defined table valued function is a subset of user defined table valued function. It can be used to achieve the functionality of parameterized views.

Simply, inline table valued function RETURNS TABLE and usually contains only one SELECT statement. Table valued function returns a table data type,   and it can contain additional statements that allow more powerful logic than is possible in views.

Example 1: Inline table valued function

CREATE FUNCTION Sales.ufn_CustomerNamesInRegion
                 ( @Region nvarchar(50) )
RETURNS table
AS
RETURN (
        SELECT DISTINCT s.Name AS Store, a.City
        FROM Sales.Store AS s
        INNER JOIN Person.BusinessEntityAddress AS bea 
            ON bea.BusinessEntityID = s.BusinessEntityID 
        INNER JOIN Person.Address AS a 
            ON a.AddressID = bea.AddressID
        INNER JOIN Person.StateProvince AS sp 
            ON sp.StateProvinceID = a.StateProvinceID
        WHERE sp.Name = @Region
       );
GO
 
Example 2: Table valued function

In a table-valued user-defined function:

   1.The RETURNS clause defines a local return variable name for the table returned by the function. The RETURNS clause also defines the format of the table. The scope of the local return variable name is local within the function.

   2. The Transact-SQL statements in the function body build and insert rows into the return variable defined by the RETURNS clause.

    3.When a RETURN statement is executed, the rows inserted into the variable are returned as the tabular output of the function. The RETURN statement cannot have an argument.

    4. No Transact-SQL statements in a table-valued function can return a result set directly to a user. The only information the function can return to the user is the table returned by the function.