Twenty Percent Knowledge: February 2015

Wednesday, February 25, 2015

SQL Server Knowledge Refresh - Tuning

As a contractor, a problem to face is to periodically refresh the knowledge on certain areas in order to prepare for new job applications. I want to make an effort to prepare an outline of major knowledge points to easy my future life.

Most of them are pasted from Internet sources. My motto is copy it, repeat it and spread it.

Clear the caches
--do not do it in production environment
checkpoint
dbcc dropcleanbuffers
dbcc [freeProcCache|FreeSystemCache|FlushProcInDB()]

also be able to free just one plan if you have the handle.

Enable some features/monitors (to set measurement)

set statistics [time|io]
set showplan [text|xml] or graphic execution plan in studio
--clear wait stats
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR)

DMVs to Use

-system info: dm_os_performance_counters, dm_os_wait_stats
-query info: dm_exec_requests; dm_exec_sql_text
-index info: dm_db_index_usage_stats; dm_io_virtual_file_stats

select xxx
From sys.dm_exec_requests ER
Cross Apply sys.dm_exec_sql_text(ER.sql_handle) ST
Where session_id = @@SPID

Given no context, what to look in execution plan:
they are not necessarily bad or wrong, sometimes they are good, but a place to start to make sure they are good.
scans: table, index
look up
parallelism:
estimated vs actual rows accessed:
physical reads: lack of memory or index
hash join, groud by, having etc will cause it to use tempdb

Some topics that makes bad queries

Default Cursor
Default cursor is the normal way most of people use cursor in SQL Server, it is global and update-able, even most of case writer have no intention to use as that. It is resource heavy and slow.

It can be made lighter and faster by explicitly declare as local and non-updateable.

declare c cursor local fast_forward
for select name from atable;

Cursor is expensive in SQL Server, not that severe in Oracle.

Not In vs Left Join
Not Exists and EXCEPT are better but EXCEPT adds extinct sort operator just like what UNION does, so duplication is removed.

has to consider the nullable columns.

RDBMS is good at relations, but not good at find no relations because it has to go through all the relationships to decide there is no relation.

Where In vs Where Exists

within in, exists, join, apply subquery, use exists over join, because it does not care the other details and it works with short circuit, it stops earlier and do not bother to scan the rest of data set. worst case is same as using join.

Reduce Recompile

reason to recompile:
--expected
create proc.. with recompile
exec proc with recompile
sp_recompile xx
plan was aged out of memory

--UnExpected
big changes after last execution: schema change, newer index statistics etc
interleave ddl or dml

sp_executesql vs exec()

sp_executesql , a good way to compose dynamic SQL because 1. it can be parameterized and 2. force recompile to choose right execution plan each time parameter changes

(Dynamic SQL is the same as normal SQL)
set @sql=N' xx=@xxIn and xxx=@xxxIn';
exec sp_executesql @sql,N'@xxIn int, @xxxIn date', @xxIn, @xxxIn;

parameterization is better than concatenation.

Comma-delimited parameters

string splitting is expensive, using table-values parameters is preffered approach

Implicit conversion is performance killer

Temp table vs table variable
later has less lock overhead, but otherwise temp table is better because it has statistics, can be indexed and support parallelism.

they are all stored in tempdb

UDFs

moving functions to inline form to improve performance.

Checking Execution plan

If they are bad is depends, but worth to pay attention to these keywards.
slow plan usually has:
lookup
RID lookup:nonclustered index
key loopup:clustered index
meaning at least one column is not in index. you got trouble when it is for lot of rows
spool
it is a cache in the query processor. it is there because the table is lack of adequate indexes or uniqueness information. it is implemented as hiden table in tempdb(not good)
table spool
lazy spool
eager spool
index spool(eager spool)

sort
order by, merge join, stream aggregation, windowing...
not linear
do you really need that sort? maybe do it in client.
hash
hash match join
hash match aggregate
good for olap bad for oltp
nested loops
Scan
index scan
table scan

Sunday, February 22, 2015

SQL Server In-Memory OLTP Garbage Collection

1. In-memory database can be bound to resource pool to limit the memory it maximum can use. say, limit the resource pool to have 20% of memory, then the associated in-memory database won't grow over the limit. operation on the db might fail, but it won't fail the entire instance.

2. garbage collection happens when retrieving data and when sql server fails the memory pressure. I feel this is similar to Oracle's row versioning management inn term of updated data block contains all information, but I need to check the book before say it for sure.

3. garbage collection uses same term of generation as what in JVM's GC. In SQL Server, there are 16 generations/queues, the first 15 queues contain 16 transaction ids in each queue (the last queue piles up what ever newer coming transaction ids?). by this design, SQL Server can deal with 16 transaction at one time on GC. after g0 is done, g1 shifts to g0.

4. there are DMvs to monitor the usage of GC

sys.dm_xtp_gc_stats
sys.dm_xtp_gc_queue_stats
sys.dm_db_xtp_gc_cycle_stats – Database Level

5. perfmon can also monitor the GC statistics

Friday, February 13, 2015

Read a Thin Book - Cloud Architecture Patterns - 5 - Eventual Consistency

CAP Theorem

It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

Consistency: everyone gets same data
Availability: has ongoing access to system
Partition tolerance: system continues to operate despite failure of part of the system

Guaranteeing consistency is easy when data is on a single node, but once the data is
distributed, partition tolerance needs to be considered. What happens if our nodes can
not communicate with each other due to failure, or simply cannot do so fast enough to
meet our performance and scalability targets? Different tradeoffs are possible. One
popular choice for cloud-native applications is to favor partition tolerance and availability and give up the guarantee of immediate consistency.

Eventual Consistency

Choose between displaying stale data and scaling more efficiently.

Applications that do not guarantee immediate consistency are said to be eventually consistent.

eventual consistency can be a powerful feature that enables better scalability. Eventual consistency is not a deficiency or design flaw. When used appropriately, it is a feature.

New term but old idea.

Eventually consistent does not mean that the system doesn’t care about consistency. Most data is consistent virtually all of the time. There is just a business-tolerable delay before updates fully propagate. During this delay, different users may see different versions of the data.

Examples

ATM, check depositing, DNS etc

RDBMS vs NoSQL

RDBMS: ACID easy for single node

Atomicity: all completes or none
Consistency:
isolation: transactions competing to change same data are applied sequntially
Durability: committed data are not lost.

NoSQL: not only SQL

BASE:
Basically available: repond with even stale data
Soft state: state might not be consistent and might be corrected
eventually Consistent: allow for delay before any individual data change completely propogates though the system.

Usually designed to support easy sharding

configurable consistency is available if you want to trade of performance in some areas.

Misc

dealing with eventual consistency is simplified if all writes go to single location. this is adopted by couchbase and mongodb, which only accept write to master node, other nodes are read-only.

Tuesday, February 03, 2015

Read a Thin Book - Cloud Architecture Patterns - 4 - Using Queues

Queue-Centric workflow pattern

A queue is put between web tier and service tier to decouple the web tier and service tier, and to give users consistent response.

enqueue and dequeue are operations done by sender and receiver.

A challenge is to handle failure on message consuming.

repeated messages are better treated as idempotent operation (same result no matter how many times being processed). excessively repeated messages is poison message.

dequeue count on same message can help to determine poison message.

The flip side here is that you now need your users to understand that even though the
system has acknowledged their action (and a command posted), processing of that ac
tion was not immediately completed.

In cases where the user wants to be notified when their action will be completed, an
email upon completion might do the trick. This is common in ecommerce scenarios
where “your order has shipped” and other notifications of progress are common.

Sometimes users will prefer to wait around while a task completes. This requires either
that the user interface layer polls the service tier until the task completes or the service
tier proactively notifies the user interface layer. The proactive notification can be implemented using long polling. In long polling, the web client creates an HTTP connection
to the server, but the server intentionally does not respond until it has an answer