Thursday, November 29, 2012

Observing Locking Behaviour With Extended Events (SQL Server 2008 +)

Tracking extended events is similar to SQL Trace, but it runs at a much lower level than SQL Trace, so that much more lower level events can be tracked and revealed. For example, it can be used to troubleshoot excessive CPU usage,deadlocks and examine how the locks are used inside SQL Server.

The code samples listed here are typical steps of using extended events.

--Assuming I have a database called TestDB and its ID is 19
--You can use dynamic SQL to avoid doing this step
select DB_ID('TestDB')
-- Perform cleanup.
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='ObservingLocking') 
    DROP EVENT SESSION FindBlockers ON SERVER 
    GO

-- create extended event
CREATE EVENT SESSION ObservingLocking
ON SERVER
ADD EVENT sqlserver.lock_acquired(where database_id = 19),
ADD EVENT sqlserver.lock_released(where database_id = 19 )
ADD TARGET package0.asynchronous_file_target(set filename = 'c:\temp\ObservingLocking.log')
WITH
(
    MAX_MEMORY = 4096KB,
    EVENT_RETENTION_MODE = NO_EVENT_LOSS ,
    MAX_DISPATCH_LATENCY = 1 SECONDS,
    MEMORY_PARTITION_MODE = NONE,
    TRACK_CAUSALITY = ON,
    STARTUP_STATE = OFF
);
go

--start the event capturing 
ALTER EVENT SESSION ObservingLocking ON SERVER STATE = START
-- experiments workload here
--begin transaction
--xxxxxxxx
--rollback

--stop event capturing
ALTER EVENT SESSION ObservingLocking ON SERVER STATE = STOP
go

--remove extended event
DROP EVENT SESSION ObservingLocking ON SERVER
GO


--Read the captured event tracing data. I usually remove the trace files before a new test
declare @Temp table (ID int identity(1,1), EventData xml)
insert into @Temp(EventData)
select  cast(event_data as xml) event_data
 from sys.fn_xe_file_target_read_file ('c:\temp\ObservingLocking_0*', 'c:\temp\ObservingLocking0_*.xem', null, null) a;

 with x as (
        select
                a.ID, x.value('../@name', 'varchar(128)') EventName, x.value('../@timestamp', 'datetime') EventTime,
                x.value('@name','varchar(128)') [Col], case when isnull(rtrim(x.value('(./text)[1]', 'varchar(128)')), '') = '' then x.value('(.)[1]', 'varchar(128)') else x.value('(./text)[1]', 'varchar(128)') end [Value]
        from @Temp a
            cross apply EventData.nodes('//data') ed(x)
    )
select *
from(
    select ID, EventName, resource_type ResourceType,
            mode Mode, resource_0 Res0, resource_1 Res1,
            resource_2 Res2, lockspace_workspace_id, lockspace_sub_id,
            lockspace_nest_id, owner_type OwnerType,transaction_id TransID,
            database_id DBID, EventTime
    from x
    pivot( MAX([Value]) for Col in ([resource_type], [mode], [owner_type], [transaction_id], [database_id], [lockspace_workspace_id], [lockspace_sub_id], [lockspace_nest_id], [resource_0], [resource_1], [resource_2])) pvt
    ) y
where ResourceType not in ('METADATA') and Mode not in ('NL')
go

Then let's do some experiments in a new Query window.

--create test table in TestDB
create table test(id int identity,c1 varchar(10),c2 int);
--insert some data
insert into test (c1,c2)values('Tired',1),('Exciting',2),('Happy',3);
--go back to another query window and start event capturing
--issue a simple select statement
select c1,c2 from test
--go back to another window and stop the tracing
--now load the log and examine it

How to understand the result?

It says first the SQL Server acquired a shared lock on database, then it got released, then it acquired IS lock on object 2105058535 that is test table in this example, it went further to put an IS lock on a page in test table, then it was released, then the shared lock was acquired and release couple time on database.

Wait, why there is no expected S lock on the rows being retrieved?

Theoretically, when SQL Server tries to read with SELECT statement, it starts to apply an IS lock on the table, then page, then issue an S lock on a record that it is going to read, then release the S lock, then put S lock onto next record, it does so until all the records are accessed then release IS lock on the page, then the table. An exceptions here is if all the records in a page have not been changed since last time the page is written to disk, SQL Server will only apply IS lock on the page without having to use S locks on the records. This reduces the overhead of getting/releasing locks.


Note: IS lock never goes to ROW level, it follows a patch of database then table and then page.
It is an intention of S lock on lower level.

Credit of this post is going to MVP John Huang, as well as MSDN.
http://msdn.microsoft.com/en-us/library/ff878413.aspx
http://msdn.microsoft.com/en-us/library/bb630340%28v=sql.105%29.aspx

Wednesday, November 28, 2012

Components of SQL Server Engine

Four major components:

Protocol layer: interprete betwwen client and relational engine
Relational engine/Query Processor: parse,compile,optimize query.
Storage engine: access data
SQLOS: thread management, deadlock detention, buffer management



Wednesday, November 21, 2012

Examine The Datetime Values

Accuracy of Datetime in SQL Server is 0.00333 millisecond. The millisecond part of value will be converted to a closer times of 0.00333, which is also rounded up to the precision of 1/1000.

0.00333 * 2 = 0.00666
0.00333 * 3 = 0.00999

DECLARE @testDT DATETIME
SET@testDT ='2012-11-11 00:01:00.003'
--this returns  '2012-11-11 00:01:00.003'
SELECT @testDT
SET  @testDT ='2012-11-11 00:01:00.004'
--This returns  '2012-11-11 00:01:00.003'. Because 0.004 is closer to 0.00333 than 0.00666
SELECT @testDT
SET  @testDT ='2012-11-11 00:01:00.005'
-- --This returns  '2012-11-11 00:01:00.007'. Because 0.005 is closer to 0.00666 than 0.00333
--0.00666-0.005=0.00166; 0.005-0.00333= 0.00167
SELECT @testDT
SET  @testDT ='2012-11-11 00:01:00.006'
--SET  @testDT ='2012-11-11 00:01:00.007' 
--SET  @testDT ='2012-11-11 00:01:00.008'  
--above values will all be converted to 0.007 because they are closer to 0.00666 than 0.00999
SELECT @testDT
SET  @testDT ='2012-11-11 00:01:00.009' 
--This returns  '2012-11-11 00:01:00.010'
SELECT @testDT


DBCC IND

Another copy and paste article mainly originated from SQL Server Storage Engine Blog  and SQL Server 2008 internals.

Syntax

DBCC IND ( { 'dbname' | dbid }, { 'objname' | objid },
{ nonclustered indid | 1 | 0 | -1 | -2 } [, partition_number] )

DBCC IND command is used to see which pages have which data, as well as which pages precede and follow various pages in all levels of the index.Basically, examine composition of index.

The first parameter is the database name or the database ID. The second parameter is an
object name or object ID within the database; the object can be either a table or an indexed
view. The third parameter is a specifi c nonclustered index ID (2-250 or 256-1005) or the
values 1, 0, –1, or –2. The values for this parameter have the following meanings:
■ 0 Displays information for in-row data pages and in-row IAM pages of the specifi ed
object.
■ 1 Displays information for all pages, including IAM pages, data pages, and any
existing LOB pages or row-overflow pages of the requested object. If the requested
object has a clustered index, the index pages are included.
■ –1 Displays information for all IAMs, data pages, and index pages for all indexes on
the specifi ed object. This includes LOB and row-overflow data.
■ –2 Displays information for all IAMs for the specified object.
■ Nonclustered index ID Displays information for all IAMs, data pages, and index
pages for one index. This includes LOB and row-overflow data that might be part of
the index’s included columns.

Column Descriptions for DBCC IND Output
  • PageFID - the file ID of the page
  • PagePID - the page number in the file
  • IAMFID - the file ID of the IAM page that maps this page (this will be NULL for IAM pages themselves as they're not self-referential)
  • IAMPID - the page number in the file of the IAM page that maps this page
  • ObjectID - the ID of the object this page is part of
  • IndexID - the ID of the index this page is part of
  • PartitionNumber - the partition number (as defined by the partitioning scheme for the index) of the partition this page is part of
  • PartitionID - the internal ID of the partition this page is part of
  • PageType - the page type. Some common ones are:
    • 1 - data page
    • 2 - index page
    • 3 and 4 - text pages
    • 8 - GAM page
    • 9 - SGAM page
    • 10 - IAM page
    • 11 - PFS page
  • IndexLevel - what level the page is at in the index (if at all). Remember that index levels go from 0 at the leaf to N at the root page (except in clustered indexes in SQL Server 2000 and 7.0 - where there's a 0 at the leaf level (data pages) and a 0 at the next level up (first level of index pages))
  • NextPageFID and NextPagePID - the page ID of the next page in the doubly-linked list of pages at this level of the index
  • PrevPageFID and PrevPagePID - the page ID of the previous page in the doubly-linked list of pages at this level of the index

Create a table to hold result dumped from DBCC IND. This technique is also often used to take snapshot of database status.

CREATE TABLE IndexPages
(PageFID tinyint,
PagePID int,
IAMFID tinyint,
IAMPID int,
ObjectID int,
IndexID tinyint,
PartitionNumber tinyint,
PartitionID bigint,
iam_chain_type varchar(30),
PageType tinyint,
IndexLevel tinyint,
NextPageFID tinyint,
NextPagePID int,
PrevPageFID tinyint,
PrevPagePID int,
Primary Key (PageFID, PagePID));

TRUNCATE TABLE sp_IndexPages;

INSERT INTO sp_IndexPages
EXEC ('DBCC IND (AdventureWorks2008, [Sales.SalesOrderDetail], -1)');

SELECT IndexLevel
, PageFID
, PagePID
, PrevPageFID
, PrevPagePID
, NextPageFID
, NextPagePID
FROM IndexPages
ORDER BY IndexLevel DESC, PrevPagePID;

Note: Use sys.dm_db_index_physical_stats to check the index level and its page numbers in each level.

SELECT index_depth AS D
, index_level AS L
, record_count AS 'Count'
, page_count AS PgCnt
, avg_page_space_used_in_percent AS 'PgPercentFull'
, min_record_size_in_bytes AS 'MinLen'
, max_record_size_in_bytes AS 'MaxLen'
, avg_record_size_in_bytes AS 'AvgLen'
FROM sys.dm_db_index_physical_stats
(DB_ID ('test')
, OBJECT_ID ('test')
, 1, NULL, 'DETAILED');

Tuesday, November 20, 2012

DBCC PAGE

This is copied from http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/06/10/625659.aspx

dbcc page ( {'dbname' | dbid}, filenum, pagenum [, printopt={0|1|2|3} ])


The printopt parameter has the following meanings:

  • 0 - print just the page header
  • 1 - page header plus per-row hex dumps and a dump of the page slot array (unless its a page that doesn't have one, like allocation bitmaps)
  • 2 - page header plus whole page hex dump
  • 3 - page header plus detailed per-row interpretation

The per-row interpretation work for all page types, including allocation bitmaps.
By default, the output is sent to the errorlog. If you want the output to come back to your current connection, turn on trace flag 3604.

DBCC traceon(3604)
DBCC PAGE (master, 1, 1, 0);

Monday, November 19, 2012

SQL Server Table and Index Structure

Most of this post is about a collection from MSDN.

SQL Server uses B+ tree to create index. That means it can be not balanced.

In a B-Tree index, each page contains pointers pointing to previous page and next page in the same index level. In SQL Server, it has 96 bytes for the header. This is same in both index page and data page. In the data part of the index page, each index row points to page in next level towards to leaf.

Another type of index is bitmap index, which is not supported in SQL Server. It's designed for rows with lot of duplicate and good for analytic processes. Its overhead on DML is heavier than B-Tree index.

Table:

A table is contained in one or more partitions and each partition contains data rows in either a heap or a clustered index structure. The pages of the heap or clustered index are managed in one or more allocation units, depending on the column types in the data rows.

A partition is a user-defined unit of data organization. When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be put on one or more filegroups in the database.

To view the partitions used by a table or index, use the sys.partitions (Transact-SQL) catalog view.



Allocation Unit:
IN_ROW_DATA, ROW_OVERFLOW_DATA, LOB_DATA



HEAP: a table without a clustered index

index_id=0 in sys.partitions.

The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list

The column first_iam_page in the sys.system_internals_allocation_units system view points to the first IAM page in the chain of IAM(Index Allocation Map) pages that manage the space allocated to the heap in a specific partition. SQL Server uses the IAM pages to move through the heap. The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages.


Clustered Index Structure:

 index_id=1 in sys.partitions table


Non-Clustered index Structure:

index_id>1 in sys.partitions

Each index row contains the nonclustered key value, a row locator (RID or Clustered index Keys) and any included, or nonkey, columns.


Friday, November 16, 2012

Why Clustered Index Must Be Unique?

It is so that nonclustered index entries can point to exactly one specific row.

Some people may argue they can create clustered index upon non-unique column and without specifying unique keyword. That's true. In that case, SQL Server adds hidden unique identifier column to the rows when necessary.

Since clustered index keys are included in every nonclustered index, so that clustered index keys are the most duplicate and redundant data in a table. The choose of clustered index keys should be narrow so that it won't take too much storage.

Clustered index keys should also static so that it won't cause data relocation and page split, as well as updating nonclustered indexes.

Unique, narrow and static are also characters when a PK is defined.

Duplicate Index

A index is duplicated if its tree structure and usage are the same as another one on the same table.

sp-helpindex can't tell you if indexes are really the same on tree structure and usage.

(

Clustered index is the data.

Non-clustered index is duplicated data. These duplicated data  helps SQL Server efficiently find the real data.

)

Non-Clustered index Structure:
. Key
. A leaf level entry(actual data stored in index+lookup values+included columns)

Here lookup values are either clustered index key if table has clustered index or RID if table is a heap. It's used to look for actual data row.

A RID is an 8-byte structure consisting of 2:4:2 bytes which breakdown into 2 for the FileID, 4 for the PageID and 2 for the slot number.

For non-unique non-clustered index, clustered keys are stored in both its tree and leaf nodes.
For unique non-clustered index, clustered key are stored only in its leaf nodes.

(A nonunique nonclustered needs to have the lookup value pushed up into the tree (for navigation). A unique nonclustered index does not.)

Sequence of included columns does not matter on usage of index so that difference on the sequence will be ignored.

Credit to Kimberly. Understanding Duplicate Indexes

unique nonclustered needs to have the lookup value pushed up into the tree (for navigation). A unique nonclustered index does not.

Read more: http://sqlskills.com/BLOGS/KIMBERLY/post/UnderstandingDuplicateIndexes.aspx#ixzz2CN0
A RID is an 8-byte structure consisting of 2:4:2 bytes which breakdown into 2 for the FileID, 4 for the PageID and 2 for the slot number

Read more: http://sqlskills.com/BLOGS/KIMBERLY/post/UnderstandingDuplicateIndexes.aspx#ixzz2CMy4WPR5
A RID is an 8-byte structure consisting of 2:4:2 bytes which breakdown into 2 for the FileID, 4 for the PageID and 2 for the slot number.

Read more: http://sqlskills.com/BLOGS/KIMBERLY/post/UnderstandingDuplicateIndexes.aspx#ixzz2CMxou9p6

Tuesday, November 13, 2012

Identifying Unused Indexes in Database

Some indexes are never used, some indexes become useless(are forgotten by optimizer) when new indexes are created. In SQL Server 2005 and above, we have a simple way to identify those indexes no longer useful. Thanks to the very useful DMV sys.dm_db_index_usage_stats.

Limit: SQL Server has to run for a while to experience most of the regular work load in a database.

SELECT OBJECT_SCHEMA_NAME(i.object_id) AS SchemaName ,
OBJECT_NAME(i.object_id) AS TableName ,
i.name ,
ius.user_seeks ,
ius.user_scans ,
ius.user_lookups ,
ius.user_updates
FROM sys.dm_db_index_usage_stats AS ius
JOIN sys.indexes AS i ON i.index_id = ius.index_id
AND i.object_id = ius.object_id
WHERE ius.database_id = DB_ID()
AND i.is_unique_constraint = 0 -- no unique indexes
AND i.is_primary_key = 0
AND i.is_disabled = 0
AND i.type > 1 -- don't consider heaps/clustered index
AND (
    ( ius.user_seeks + ius.user_scans +ius.user_lookups ) < ius.user_updates
    OR
    ( ius.user_seeks = 0 AND ius.user_scans = 0)
)

Monday, November 12, 2012

Left most index key

It should be most selective(more distinct values) because this reduces the number of database pages that must be read by the database engine while traversing the index, in order to satisfy the query.

Keep in mind that SQL Server's index has an entry for each of the row in underlying table, no matter they are unique or duplicate.

About included columns


It can be only used with non-clustered indexes.

The key columns are stored in all level of an index, while included columns are stored only at leaf level of an index.

Typical usage of included columns is to create a covering index, which contains all columns required by a query, either key columns or included columns. It's a way to remove bookmark(key look up)/RID look up in execution plan. (decreasing I/O required to return data.)

Pros:

-- The non-key columns do not count towards the limitation of 900 bytes key size or 16-columns.

--The non-key columns can use data types not allowed by index key columns;
(all data types except the varbinary (max) columns that have the FILESTREAM attribute, the legacy text, ntext, and image are supported.)

-- Major solution to expensive bookmark look up

Cons:

-- Using of included columns will result in higher disk space usage in order to store the index,
-- An increase in I/O demands
-- Lower buffer cache efficiency
-- Reduced performance of data modification operations

Doomed(uncommittable) Transaction

Doomed transactions are introduced with TRY/CATCH implementation introduced since SQL Server 2005. When it happens, the request cannot commit the transaction or roll back to a savepoint; it can only request a full rollback of the transaction.


XACT_STATE() can be used to detect the situation. And it's suggested to always check its value in CATCH block.

XACT_STATE():
1: has active user transaction.@@TRANCOUNT>0
0: no active user transaction. @@TRANCOUNT=0
-1: Doomed transaction , need rollback

BEGIN TRAN
BEGIN TRY
--or use raiserror function
select convert(int,'abc')
commit tran
END TRY
BEGIN CATCH
if XACT_STATE()=-1
BEGIN
    print 'uncommittable transaction, roll it back'
    ROLLBACK
END
END CATCH

Sunday, November 11, 2012

Some Facts on Error message in SQL Server

@@ERROR function returns the most recent error code. 0 means no error.

Error level is break down to these categories:
1 - 10: warning
11- 16: errors can be corrected by user
17 - 19: more serious exceptions, such as out of memory.
20 -25: fatal connection and server level exceptions.

Who followed that? Even MS does not. So information purpose.

 RAISERROR without specifying message id uses 50000 as its message id.

Error level can be overridden by specifying new number other than -1. -1 one means to use error level defined in sys.messages.

When creating custom messages, message number ranges 50000 to 2147483647. The message is added by sp_addMessage procedure. Message text and severity can be updated by also using same procedure with @replace='Replace'. sp_alterMessage can be used to change other parts of message. sp_dropMessage to drop user defined message. Apparently, the first two procedures are not well designed, bear with them and remember them.

Error can be logged to SQL Server's error log by using WITH LOG if invoked by sysadmin or user having ALTER TRACE permission.

Transaction, Batch and Exception in SQL Server


Transaction and batch can include each other, exception can stop a batch but usually never rollback a transaction. In the case of exception with error level equal or above 20, since SQL Server tries to close the connection or terminate the process, the transaction will be rolled back as I can imagine.
Transaction has characters of ACID. It guarrantties atomic operations between statements included in it. Batch is a way to submit a bounch of statements in one effort. SQL Server compiles a query plan for a batch.
1. Transaction can have multiple batches

BEGIN TRAN
DML1
DML2
GO
DML3
GO
....
GO
END TRAN

Example:

begin tran;
create table t(i1 int,c1 varchar(10));
select @@TRANCOUNT;
go
insert into t values(1,'a'),(2,'b');
go
select @@TRANCOUNT;
select * from t;
rollback;
select @@TRANCOUNT;
select * from t;
2. A Batch can be composed of several transactions. In the following example, two transactions are submitted to database in one batch.
BEGIN TRAN
DML1
END TRAN
BEGIN TRAN
DML2
END TRAN
GO

Example:

begin tran;
create table t(i1 int,c1 varchar(10));
select @@TRANCOUNT as numTrans_1;
commit;

begin tran;
insert into t values(1,'a'),(2,'b');
select @@TRANCOUNT  as numTrans_1;
commit;

select @@TRANCOUNT  as numTrans_0;
select * from t;
drop table t;
go

3. Exception and Batch

3.1 Statement level exception

SQL Server continue to execute rest of a batch should a statement level exception happens. The statement itself fails of course.
select power(3, 32)
print 'this still runs'
--print is better than select in this test case since it's printed together with error message
go

3.2 Batch level exception

SQL Server stops running rest of statements in a batch when batch level 3xception happens. The exception also bubbles up each level of calling stacks, and aborts all of them. Connection to SQL server is still good.
select convert(tinyint, 'aaa')
select 'this will not execute'
go

--following code demonstrate exception bubbling up
create procedure bubbleUp
as
begin
select convert(tinyint,'aaa')
end
go

exec bubbleUp
select 'this will not return'
go
3.3 Connection level exception

The connection to the SQL Server will be closed if it happens.This is usually caused by internal server errors. I haven't tried that hard to make it happen. :)

Exception with error level 20 or above will make SQL Server to terminate the process.

Note: My connection kept alive even after SQL Server said it was terminating the process. MS really knows how to give us hard time on its error handling mechanism, like you can never make a clear understanding on state returned in error message.
--
select top 1 * from sys.objects
raiserror('this test will break the connection.',20,1) with log;
select top 1 * from sys.objects
--result
(1 row(s) affected)
Msg 2745, Level 16, State 2, Line 2
Process ID 51 has raised user error 50000, severity 20. SQL Server is terminating this process.
Msg 50000, Level 20, State 1, Line 2
this test will break the connection.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command.  The results, if any, should be discarded.

3.4 Parsing and scope-resolution level exception

They will act like batch level exception if they happen in the same scope as rest of statements in a batch; They will act like statement level exception if they happen in lower level scope such as in a procedure or function called in a batch.
--direct in a batch
selectxyz from table
select 'this will not run'
go

--in lower scope of a batch, such as in another statement
exec('selectxyz from table')
select 'this will return'

--scope resolution
--this will be created because syntax is correct

create procedure scopeResolutionException
as
begin
select xxx from nonExistingTable
end
go

--this will fail because querying non-existing table
exec scopeResolutionException
select 'this will execute'
--also clean it up
drop procedure scopeResolutionException
go

-- scope resolution error, whole batch is aborted
select xxx from nonExistingTable
select 'this will not execute'
go

NOTE: when a statement is committed to SQL Server, SQL Server does parsing and compilation together. For a stored script, SQL Server parses the syntax but does not compile execution plan at same time. It uses late binding and create execution plan upon the time it executes the stored unit.
4. XACT_ABORT

Understanding the different level of exceptions in SQL Server is necessary since we can't handle the them well without knowing them the first. But to figure out what exception belongs to what exception type is tedious and cumbersome. The people who understand this well won't bother to deal with each situation, the people who do not understand this well also made success program, why?

SET XACT_ABORT ON;

This is our star. It aborts both batch and transaction when any level of exception occurs. Aborting transaction here means rolling back the transaction. It turns all the complex situations into one simple scenario: stop running rest of code, rollback transaction and bubble up exception to caller(if there's no TRY/CATCH).

To make a full use of it, it's better to use explicit transaction to organize related operations into one atomic transaction. Then most of time, you just need to remember to have the transaction committed.

CREATE PROCEDER ppp
AS
BEGIN
SET XACT_ABORT ON;
BEGIN TRAN
.....
COMMIT TRAN
END

For the situation TRY/CATCH is used in the code, please see this post:http://mashijie.blogspot.ca/2012/11/doomeduncommittable-transaction.html

Thursday, November 08, 2012

Building a String From Multiple Records

In SQL Server, there are multiple ways to build up a string from values distributed in multiple record.

Preparation

Create table #m (id int identity,c1 varchar(200))
GO
create table #c (id int,c1 varchar(20))
GO
insert into #m(c1) values(1)
 insert into #c(id, c1) values
(1,'1,2'),
(1,'3,4,5'),
(1,'6,7,8')
GO

Method 1, using variable and SELECT statement.

Declare @v1 varchar(200)
SET @v1=''
SELECT @v1 = @v1 + c1 +';'
FROM #c
PRINT @v1

Method 2, a little bit XML and string operation, if you are still using SQL Server 2000.

The good of FOR XML is it returns result in one XML document, which can be treated as a string. Soemtimes it's handy and useful when you want to display master records with result from child tables.

This is also an example on when you want to use sub-query in your statement.
(
Why the hell they invent so many names for a query used in a query(embeded query)?

SELECT list --> Sub-query
FROM --> Derived table
WHERE --> Correlated sub-query.
)

select m.id,
replace(replace(replace(
(select c1 from #c c where c.id=m.id FOR XML AUTO),
'',';')
from #m m
where m.id=1


When this technique is used in update statement, it sometimes has advantages over the other ways such as using cursor and loop because of its conciseness.

Friends using Oracle? use LISTAGG function to have a single row output from multiple rows operated.

Tuesday, November 06, 2012

Methods of Importing Data into SQL Server Database

This is a brief of options you can use to import/export data into/from SQL Server database. And These are the methods shipped with SQL Server.

1. Export/Import Data Wizard
2. SSIS (can by created from 1)
3. BCP

Can be invokded by xp_cmdshell if it has to be used in SQL script.

Here is its syntax and usage: http://msdn.microsoft.com/en-us/library/ms162802.aspx

--generate format file(non-xml or xml format file.)
--e.g.non-xml

bcp dbs.dbolocation format nul -T -c  -f Currency.fmt

Trick: if you are using query and need to create format file for it, you can either
generating one from table the first and then manually modify it, or you can create a view in database and then generate format file from the using the view.
--export
--import using format file

4. BULK INSERT

Another version of BCP. It is a T-SQL Command, Can be used directly in SQL script.

BULK INSERT dbo.location
FROM 'C:\locationData.txt'
WITH ( FIELDTERMINATOR =',', ROWTERMINATOR = '\n' )

5. OPENROWSET

It's a T-SQL command. It can be used to import data from various data sources, such as spreadsheet, etc.

It's often used with INSERT INTO statement.

INSERT INTO dbo.Location
SELECT * FROM OPENROWSET('csv drive','csvfile.txt')

I have an example to load XML file into database for further transformation.

create table #tmpXML (xml315 XML);
EXEC('INSERT INTO #tmpXML(xml315)
 SELECT CAST(xContent AS XML)
 FROM OPENROWSET(
   BULK '''+@XMLFilePath+''',
   SINGLE_BLOB) AS xfile(xContent)'
)
INSERT INTO [dbo].[ShipmentStatusMessage]
      ([MessageSender]
      ,[MessageRecipient]
      ,[MessageDate]
      ,[MessageTime]
      ,[MessageID]
      ,[Filename]
      ,[X-12]
      ,[VesselName]
      ,[VesselCode]
      ,[VesselCodeQualifier]
      ,[VoyageNumber]
      ,[OperatingCarrier])
 SELECT
 r.value('./MessageSender[1]', 'NVARCHAR(15)'),
 r.value('./MessageRecipient[1]', 'NVARCHAR(15)'),
 r.value('./MessageDate[1]', 'NVARCHAR(8)'),
 r.value('./MessageTime[1]', 'NVARCHAR(8)'),
 r.value('./MessageID[1]', 'NVARCHAR(20)'),
 r.value('./Filename[1]', 'NVARCHAR(50)'),
 r.value('./X-12[1]', 'NVARCHAR(2000)'),
 r.query('./ShipmentStatus/VesselInformation/VesselName').value('.', 'NVARCHAR(28)'),
 r.query('ShipmentStatus/VesselInformation/VesselCode').value('.', 'NVARCHAR(8)'),
 r.query('ShipmentStatus/VesselInformation/VesselCodeQualifier').value('.', 'NVARCHAR(1)'),
 r.query('ShipmentStatus/VesselInformation/VoyageNumber').value('.', 'NVARCHAR(10)'),
 r.query('ShipmentStatus/VesselInformation/OperatingCarrier').value('.', 'NVARCHAR(10)')
 from #tmpXML
 cross apply xml315.nodes('//ShipmentStatusMessage') as T(r)

6. OPENDATASOURCE
Similar to OPENROWSET. I see people treat it as a table and do insert, update and delete on it.

7. OPENQUERY

Another T-SQL Command you can use directly in SQL Server. To use it, linked server
will have to be created the first. A linked server is a static data source comparing to the ones created on the fly in the opendatasource or openrowset.

8. LINKED Server

You can issue query against linked server straight forward.

Assuming you have created a linked server to another SQL Server database and you called it abc123, in OPENQUERY, you can use it this way:

select * from OPENQUERY(abc123,''select * from aTable)

Or, you can query linked server directly like this:

select * from abc123..aTable

Friday, November 02, 2012

Index Scan

Generally, when query optimizer chooses index scan, it thinks that is a most efficient way to execute the query based on its understanding on the context when the query is executed. It can be because of how the query is written, how the indexes are built on the underlying tables and how the statistics have been updated. 

A non-selective queries likely to have index scan or table scan. If that is the nature of the query, index or table scan is inevitable, and sometimes, if the table is small, that is actually a more efficient way to go. For many other situations, for the unwanted index or table scan, there are means to change the query, change the indexes on the tables to turn them to index seek, which in most of cases are more efficient than index or table scan.

A query can not seek on an index if the query does not filter on index's left most indexed column(index key). This is because SQL Server only maintains the key distribution for an index's left most column. Many times, even the index key appears in the criteria, but it is used in a function or it is implicitly converted to another data type, the optimizer ends up index or table scan because the operation makes it non-deterministic.

Note: The order of columns in where clause is irrelevant.

Let's do some experiments to discover some of the situations that will end up with index scan. Suggestions on how to make them more efficient are also given.

Preparation

Let's create a table with 3 columns, prepare some data, and build an index on first two columns.

create table test(id int identity,a int, b int, c int)
go
insert into test(a,b,c)
select ROW_NUMBER() over (order by a.object_id),
RANK() over (order by a.object_id),
DENSE_RANK() over (order by a.object_id)
from sys.columns a cross join sys.columns b
go
create index ix_a_b on test(a,b)
go

Non-clustered Index scan

1. Fields in criteria and select list are all covered by underlying non-clustered index, but the operator is open or result is most of the table content.

    select a,b from test where a not in (100,101)
    -- this can be tuned by specifying a <100 or a>101
    select a,b from test
    -- this is not likely to be further tuned.

2. Fields in criteria and select list are all covered by underlying non-clustered index, but there's a function call or operation on the column used in the criteria.

    select a,b from test where a+1 =1000 -- this can be tuned by using a=999
    select a,b from test where abs(a) =1000 -- this can be tuned by using  a =1000 or a=-1000
----glad to see the implicit conversion between varchar and nvarchar no longer ends up with index or table scan in SQL Server 2012.

3. Criteria contains first indexed column, the operator is enclosing, but the column is operated by a function or operation.
    select a,b,c from test where abs(a) =1000 -- this can be tuned by using  a =1000 or a=-1000

Clustered Index scan

Recreate the table. Intention here is to make our experiment as simple as possible.

drop table test
go
create table test(id int identity,a int, b int, c int)
go
insert into test(a,b,c)
select ROW_NUMBER() over (order by a.object_id),
RANK() over (order by a.object_id),
DENSE_RANK() over (order by a.object_id)
from sys.columns a cross join sys.columns b
go
create clustered index ix_a on test(a)


1. For a query to select all rows from a table, if the table has a clustered index, it will end up with clustered index scan, which is essentially a table scan.

    select a,b,c from test

2.  Fields in criteria and select list are all covered by underlying non-clustered index, but the operator is open or result is most of the table content.

    select a from test where a not in (999,1000 )
--optimizer is smart enough to use index seek if there is only one value in the parentheses.

3. Fields in select list are not covered by any indexes created on the table and column used as predicate is not straight. This becomes table scan/clustered index scan.

     select a,b,c from test where a+1 =1000

4. Criteria contains only non-indexed columns, optimizer has no good predicate to use. it ends up table scan/clustered index scan.

     select * from test where c =1000

The situation that will cause index scan is very complex and may change over the time when statistics on column and indexes are not accurately updated. Please do not take the situations here as granted, you will have to check the execution plan to understand why it ends up with certain operations in your very specific execution context.