Twenty Percent Knowledge: January 2009

Friday, January 30, 2009

In or inner join vs exists

This article is based on Oracle database.

IN is similiar to Inner join, "exists" is similiar to a loop statement.

generally say, "exists" works better in most of situation since it requires you to specify a join condition, which can invoke an INDEX scan.

But it's not always true. when the result set of subquery is small enough, inner join or IN works better. first of all, subquery can also use index scan as no exception. sencondly, using IN or inner join, you're telling the rule-based optimizer that you want the inner query to drive the outer query. the engine will run out the driving query that returns small result set the first and that will make filtering rows in outer query faster.

which way is better? checking the execution plan will give you final answer.

please note that in Oracle, the query parser will always change subquery to be "select 0" whatever value you specify in the select list(common ones are 0,1,X,null).

find parent rows not having a child

1.select * from parent where not exists(select 1 from child where parent.pk=child.fk)
this is better since no join is involved

2.select * from parent where parent.pk not in (select fk from child)
this is most natural query. a little bit worse than first method since join operation is heavier

3.out join
select * from parent left (outer) join child on parent.pk=child.fk where child.fk is null

this utilize the feature of outer join. you make a result from join the first, then filter out the record wanted. might be worst performance.

select 1 vs select * in exists function

in SQL server, select 1 is better than select *.

if you check the execution plan, you will find they have exactly same plan and performance. but during the stage of query plan, * is expanded to bind to all the columns in the underlying table. you can imagine this will have to retieve meta data from database that can be avoided in "select 1" syntax.

so the advantage of "select 1" is to avoid parsing uncessary information that will discarded during the execution.

Tuesday, January 20, 2009

SQLServer refresh

T-SQL With Check view option

it's used to prevent the data from being modified in such a way that it falls out of the view. simply, it forbids update on the column used in the view's where clause.

Table Relationship

one-to-one,one-to-many,many-to-many

for one to one, they can always be merged into one table. they are divided into two or more tables under special concern such as efficient storage,security and temporary data management. to implement this relationship, make PK a FK to each other.

for one to many relationship, it's usually implemented by FK and PK.

For many-to-many relationship, no such constrain can be defined. they are implemented by defining a junction table.

When two tables need a many-to-many relationship, create a third table, also know as junction table. this junction table will have a composite PK created from PK from each of the two tables.

Normalization

the process of ensuring that your database complies with normal forms is referred to as normalizing a database.

1st normal form
1NF requires that all column values in a table be divided to their smallest element. on other wards, all non-pk fields are atomic(can not be divided further). additionally, it should have no repeating groups.

2NF applies to table that have composite PKs. a table is said to be in 2NF if it is in 1NF and every non-pk attribute is completely dependent on the whole pk. in other words, if an attribute within the table is dependent on only one column of the composite pk then it is not 2NF.

3NF means fields within the table are only dependent on pk and not dependent on non-pk fields. to ensure that a table complies with 3NF, review each column and ask "is this column dependent on the PK?"

De-normalization

do it more for better performance.

online transaction processing OLTP databases are mostly normalized because they are focused on inserts and modifications, which perform better in a normalized db.

online analytic processing OLAP databases are commonly normalized because they are focused on querying, which perform better in denormalized db.

Generalization

similar to concept in OOD, is a process of combing similar entities together to form a single entity. e.g. all kind of people can be organized in contact table.

Order By

used to guarantee the order of your result set. columns in order by clause don't have to be in the select column list. the exception is if we use distinct clause to eliminate duplicate rows. in this case, the column in order by clause must be in the select column list.

Aggregation Function

NULL value will not be counted in the calculation of aggregation function.
eg. count(*) will count all the rows. count(column) will only count on column with not null values.

Identity column

you can define only one identity column in a table. the position of it can be any where. it's not necessarily put it in the left most or right most of the column list. while insert statement has column list missed, sql server can figure out if there's a value provided for identity column automatically.

be aware that the identity is similar to sequence in Oracle, even an insert statement fails execution, as long as this is not identity column related problemm, the identity will still be increased 1 step forward, and that is irreversible.

N in Front of String

it means the following string is in uni-code

Coalesce vs case statement

Coalesce choose the first non-null value of the list. this is similar with case statement, which choose the first satisfied branch to use.

Case. when..else..End statement is fully fledged statement that evaluate express as condition. Coalesce is much simpler and only tells if the value is null.

Len
Be careful that len function does not count trailing blanks

Tracing the execution time
here's an example code I can use in everyday life while it comes to performance turning.

declare @start datetime,@end datetime
set @start=getdate()
.....do something
set @end=getdate()
print 'time:'+str(datediff(ms,@start,@end))+' ms'

Remove vs Alter
The reason why you do alter instead remove statement is to preserve the permissions on the original object.

Select record with max xxx
use exists that basically makes an inner join to itself.
select * from tbl a where not exists(select 1 from tbl b where a.increamentalCol <>

Storing different language set in MSSQL

if data is to be stored within a column in a different collation than the database, they should be defined as Unicode column. this will double the storage space, but it ensures that data can be moved smoothly between columns.

typically, a computed column can be created on this kind of column with different collation to facilitate data access in different collation without explicit collation conversion.

First Rule on Database Design

Keep the row size as small as possible.

This is regarding to efficient physical storage and also efficient access to the data in a table. the smaller a row is, the more rows can be stored in one physical IO access unit, the faster the data can be found.

Tuesday, January 13, 2009

How to make "No Data Found" in Jasper report

Problem:

There's no record found for report, and you want to print "No Data Found" in the report.

Solution:

Solution 3 is the preferred one.

1. Make a dummy empty record for the report so that report display a line in its detail. you can use one of the column to have value of "No Data Found" to solve the problem. you can also make a field that is printed only when dummy record is used.

2. Define a column footer that contains a static text of "No Data Found". make the column footer print only when report count is equal to zero.
Note: in version 2, the column footer is printed at bottom of page. newer version might make it float with column header if there's no body available.

3. Define a report group that groups on nothing. hide the group header. define a static box of "No Data Found" in group footer. make group footer print only when report count is equal to zero.

Enable SQL Trace in MySQL

In Windows, in the configuration file, for instance, my.ini, add a line like this:
log=c:\mysql.log

same thing in linux or unix, adding same "log" entry in configuration file will turn on the sql trace.

remark the log entry will turn the trace off.