Twenty Percent Knowledge: October 2012

Wednesday, October 31, 2012

Define Default Parameter in User Defined Funciton

I have a user defined function like this.

CREATE FUNCTION [dbo].[fn_FormatCSVString]
(
    @inString varchar(200), @inFixedLength tinyint=2
)
RETURNS VARCHAR(300)
AS
BEGIN

    DECLARE @csvResult VARCHAR(300), @position smallint, @strLen smallint
    SET @strLen = LEN(@inString)
    SET @position=1
    SET @csvResult=''
    IF @strLen<=@inFixedLength OR @inFixedLength<=0
    BEGIN
        RETURN @inString
    END
    WHILE @position<=@strLen
    BEGIN
        SET @csvResult = @csvResult + SUBSTRING(@inString,@position,@inFixedLength)+','
        SET @position = @position+ @inFixedLength
    END

    RETURN SUBSTRING(@csvResult,1,LEN(@csvResult)-1)

END

Can you call it with second parameter ignored?
    SELECT [dbo].[fn_FormatCSVString]('abc123')

No, SQL Server won't allow you to do that. You will have to either specify default or provide a meaningful parameter for that.
    SELECT [dbo].[fn_FormatCSVString]('abc123',default)
    SELECT [dbo].[fn_FormatCSVString]('abc123',3)

Do not know why MS does this differently from what they do on procedures. I guess this is something you have to remember. It can also be a tricky question in the DB interview to see if they are detail oriented.

Monday, October 29, 2012

Rows in sysindexes is not accurate

(
Jan 23, 2013
In Oracle, you use this to figure out the number of records in a table.
select owner,table_name,num_rows from all_tables where table_name=upper('xxx')
)

This is what I learned today.

Rows column in sysindexes is not accurate and thus can't be used to determine number of rows in the associated table. The row_count in the new sys.dm_db_partition_stats is also not reliable. That makes result from sp_spaceused also not accurate since it relies on those system views.

So it comes a conclusion that it should not be used in program for your business need, but it is still good for DBA to use for system maintenance or to quickly get a rough row counts in large table to fulfill on-demand request.

Here is a script you can use in SQL 2005 and above to get a list of index and their row counts on a table.The sum of row_count is the number reported in sp_spaceUsed.

select i.name,s.partition_id, i.type_desc,s.row_count
from sys.indexes i join sys.dm_db_partition_stats s
on i.object_id=s.object_id and i.index_id=s.index_id
where i.object_id=object_id('dbo.tablename')

sysindexes view is to be deprecated so that try not to use it in the work.

In order to get more accurate result,DBCC UPDATEUSAGE WITH COUNT_ROWS can be executed, and sp_spaceused can be used afterword to get row counts at that moment.

Note, exec sp_spaceused @updateusage = ‘true’ equals running DBCC UPDATEUSAGE before running sp_spaceused, it won't update rows used in sysindexes or sys.dm_db_partition_stats.

Why SQL Server cannot guarantee accurate allocation information and row counts? It is said in an article that the reason is to reduce database blocking. If table space allocation and row count information were to be maintained accurately on every INSERT and DELETE, or when an index is dropped, or when a large bulk copy operation is performed, database concurrency could suffer as users in a high transaction environment wait for their transactions to complete as the space information is maintained in real time. I will take it for now.

Wednesday, October 24, 2012

Reduce Lock Contention in SQL Server

it is from http://support.microsoft.com/kb/75722.

Locking in SQL Server helps ensure consistency when reading and writing to the database. There is always a tradeoff in any relational database system between concurrency and consistency. It is always important to maintain consistency, or accuracy, of the data. However, the highest levels of consistency can result in less concurrency, or worse performance, if the proper steps are not taken.

Sometime, database performance tuning is the art of choosing the right tradeoffs.

The following methods can be used to reduce lock contention and increase overall throughput:

1. Avoid situations in which many processes are attempting to perform updates or inserts on the same data page. For example, in version 6.x and earlier, if there is no clustered index on a table, or if the clustered index consists of a nonrandom value, such as an ever-increasing key value, all inserts will go on the last page of a table. This particular hotspot situation can be avoided by creating a clustered index on a value that will insure each user and/or process is inserting to a different page in the table.

2. Avoid transactions that include user interaction. Because locks are held for the duration of the transaction, a single user can degrade the entire systems performance.

3. Keep transactions that modify data as short as possible. The longer the transaction, the longer the exclusive or update locks are held. This blocks other activity and can lead to an increased number of deadlock situations.

4. Keep transactions in one batch. Unanticipated network problems may delay transactions from completing and thus releasing locks.

5. Avoid pessimistic locking hints such as holdlock whenever possible. They can cause processes to wait even on shared locks.

6. optimistic concurrency control can be specified in read-only environment

7. avoid expensive calculations while locks were hold.

8. design the code to have different phases so that to reduce the locking in shortest period on the shared resources. this is extremely useful in ETL's staging concept.

Friday, October 19, 2012

forcing recompilation

Ways to force stored procedure recompilation:

sp_recompile
Eexecute with recompile
Ccreate procedure with recompile
Remove plan from cache: dbcc freeproccache

Create temporary table inside stored procedure

enforce automatic proceure recompilation

mixing DDL and DML will make procedure to be recompiled each time it meets the first DML after DDL. A common use of this would be creating temporary table in begining of procedure. But remember to put all creation statements in one place so that recompilation happens only one time. if DDL and DML are mixed several times, the procedure will be recompiled for several times.

other situations that make procedure recompiled include
- table structure change
- index change, even adding irelevant index on relafed tables
- number of rows being changed 3xceeds the threshold

NoLock was blocked

For nolock in SQL Server, it's equivalent to READUNCOMMITTED transaction isolation level. It will not be blocked by other locks event exclusive locks but SCH-M (schema modification lock) because it does put S lock on DB and SCH-S(stability) lock on table.

Today, when I issued a query like "select * from test nolock", it was blocked by an update statement from another connection. I was so confused and began to doubt if my knowledge is solid. :(
Then after a while I realized that for nolock table hint, it does not require WITH keyword, but it needs to be enclosed in brackets\parentheses. In the experimented statement, the nolock was treated as table alias. after correcting it to be "select * from test (nolock)", it executed as expected.

A subtle mistake to me today.

Database file system, a practice

A record of my experiment. I will make it more complete later.

1. table design.
A file system entity is either a file or a directory. a file belongs to a directory. root directory belongs to itself.
1.1 ERD

1.2 Creation and Initialization
CREATE TABLE [dbo].[DBFile](
    [ID] [bigint] identity(0,1) NOT NULL,
    [Name] [varchar](250) NOT NULL,
    [Type] [smallint] NOT NULL,
    [Parent] [bigint] NULL,
    [Content] [varbinary](max) NULL,
CONSTRAINT [PK_DBFile] PRIMARY KEY CLUSTERED
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[DBFile] WITH CHECK ADD CONSTRAINT [FK_DBFile_DBFile] FOREIGN KEY([Parent])
REFERENCES [dbo].[DBFile] ([ID])
GO

ALTER TABLE [dbo].[DBFile] CHECK CONSTRAINT [FK_DBFile_DBFile]
GO

--initiate the root record
set IDENTITY_INSERT dbfile on
insert into dbfile (id,[Name],parent,type) values(0,'/',0,0)
set IDENTITY_INSERT dbfile off

2. interfaces and implementation
2.1create new file
create procedure usp_DBFile_Ins @parent bigint=0, @fileName varchar(200), @isDirectory bit=0, @fileId bigint output
as
begin
insert into dbfile ([Name],parent,type) values(@fileName,@parent,case when @isDirectory=0 then 1 else 0 end)
set @fileId= @@identity
end
2.2 update file's content
create procedure usp_DBFile_upd @fileId bigint, @content varbinary(max)
as
begin
update DBFile set Content=@content where ID=@fileId
if @@ROWCOUNT=0
begin
    RAISERROR ('Specified file ID is not existing.',1,1)
    return -1
end
return 0

end
2.3 type file content
create procedure usp_TypeFile @fileId bigint
as
begin
select ID fileId,name fileName,convert(varchar(max),content) from DBFile where ID=@fileId and [TYPE] = 1
if @@ROWCOUNT<1 br="br">    RAISERROR ('Specified ID is not a file.',1,1)
end

2.4 list directory/file

create procedure usp_ListDir @fileID bigint
as
begin
WITH dirFiles
as
(
--first level directory or file
select ID,name,case when [TYPE]=1 then 'File' else 'Directory' end FileType from DBFile where ID=@fileID
union all
select af.ID,af.name,case when af.[TYPE]=1 then 'File' else 'Directory' end FileType
from DBFile af join dirFiles df on af.Parent=df.ID
where af.ID>0
)
select * from dirFiles
end

2.5 Delete file

3. test codes

exec usp_DBFile_Ins 0,'C:',1
exec usp_DBFile_Ins 0,'D:',1
exec usp_DBFile_Ins 1,'testFile.txt',0

select * from dbfile

declare @content varbinary(100)
set @content=convert(varbinary(100),'test content')
exec usp_DBFile_upd 3,@content
exec usp_TypeFile 3

select convert(varchar(100),convert(varbinary(max),'test content'))

exec usp_ListDir 0

declare @fileId bigint
declare @content varbinary(100)
set @content=convert(varbinary(100),'test content. this file is on drive d.')
exec usp_DBFile_Ins 2,'testFileOnD.txt',0,@fileId output
select @fileId
exec usp_DBFile_upd @fileId,@content
exec usp_TypeFile 7
exec usp_ListDir 2

Monitor General Health of SQL Server Database

1.Check the slow queries

The reason why it's slow might be because of blocked by other processes. But it's a good start place for those running slow.

select getdate() runtime,st.text,qs.plan_handle,qs.last_execution_time,qs.last_elapsed_time/1000000.00/60 Duration_minutes from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(QS.sql_handle) as ST order by last_elapsed_time desc

Take a look at the slowest queries and try to optimize them. Their execution plan can be retrieved from sys.dm_exec_query_plan ( plan_handle ). use cross apply to add query plan into the result.
Or just pick up on and examine individually.

- get the plan_handle of the query whose plan is going to be examined.
- retrieve the plan
select * from sys.dm_exec_query_plan ( plan_handle )
- click on the link to get graphic execution plan for easy reading.

Note:
For an execution of procedure, each statement will end up one entry in the sys.dm_exec_query_stats.
the query can be changed to work on the query plan for that proc. then you can focus on the one that uses most of the time to execute.

For example, Change the above query further to work on one specific query plan.

select st.text,qs.plan_handle,qs.last_execution_time,qs.last_elapsed_time/1000000.00/60 Duration_minutes
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(QS.sql_handle) as ST
where qs.plan_handle=0x05000900C0DF911440A3FAAF000000000000000000000000
order by last_execution_time asc,last_elapsed_time desc

Checking its execution plan, you will see query 18 is the most costly one, that is also corresponding to the execution duration in the query statistics. Then you will be sure that is the performance bottleneck in that procedure. The rest of the work is to focus on it and tune it.

2. Find idle sessions that have open transactions.
An idle session is one that has no request currently running.

SELECT s.* FROM sys.dm_exec_sessions
AS s
WHERE
--has open transaction
EXISTS ( SELECT * FROM sys.dm_tran_session_transactions AS t WHERE t.session_id = s.session_id )
--no request
AND NOT EXISTS ( SELECT * FROM sys.dm_exec_requests AS r WHERE r.session_id = s.session_id );

3. Check the index usages

3.1 check how the indexes are used.

select db_name(iu.database_id)DBName,OBJECT_NAME(iu.object_id) ObjectName,i.name IndexName,iu.*
from sys.dm_db_index_usage_stats iu join sys.indexes i on iu.object_id=i.object_id and iu.index_id=i.index_id
join sys.objects o on iu.object_id=o.object_id
where database_id=9 and o.type='U'
...

3.2 Check size and fragmentation information: sys.dm_db_index_physical_stats
--the following statement check the index on object 1464392286 in database with id 9.
select * from sys.dm_db_index_physical_stats(9,1464392286,null,null,null)

Reducing Fragmentation in an Index(http://msdn.microsoft.com/en-us/library/ms188917.aspx)

When an index is fragmented in a way that the fragmentation is affecting query performance, there are three choices for reducing fragmentation:

Drop and re-create the clustered index.
Re-creating a clustered index redistributes the data and results in full data pages. The level of fullness can be configured by using the FILLFACTOR option in CREATE INDEX. The drawbacks in this method are that the index is offline during the drop and re-create cycle, and that the operation is atomic. If the index creation is interrupted, the index is not re-created. For more information, see CREATE INDEX (Transact-SQL).
Use ALTER INDEX REORGANIZE, the replacement for DBCC INDEXDEFRAG, to reorder the leaf level pages of the index in a logical order. Because this is an online operation, the index is available while the statement is running. The operation can also be interrupted without losing work already completed. The drawback in this method is that it does not do as good a job of reorganizing the data as an index rebuild operation, and it does not update statistics.
Use ALTER INDEX REBUILD, the replacement for DBCC DBREINDEX, to rebuild the index online or offline. For more information, see ALTER INDEX (Transact-SQL).

Fragmentation alone is not a sufficient reason to reorganize or rebuild an index. The main effect of fragmentation is that it slows down page read-ahead throughput during index scans. This causes slower response times. If the query workload on a fragmented table or index does not involve scans, because the workload is primarily singleton lookups, removing fragmentation may have no effect.

Reducing Fragmentation in a Heap

To reduce the extent fragmentation of a heap, create a clustered index on the table and then drop the index. This redistributes the data while the clustered index is created. This also makes it as optimal as possible,

4. Identifying missing, duplicate or unused indexes

5. monitoring remaining disk space

6. monitoring performance

7. Check database integrity
DBCC CHECKDB

8. Remove older data from msdb
sp_delete_backuphistory
sp_purge_jobhistory
sp_maintplan_delete_log

and much more......