Tuesday, April 28, 2015

Data Warehouse Architectures


Among the architectures, two most popular ones are Inmon's Corporation Information Factory and Kimball's Dimensional Data Warehouse.
  • Corporation Information Factory, also known as hub and spokes. The later can be told by the following diagram easily.
It consolidates information various data source throughout enterprise into a centralized repository (enterprise data warehouse). The enterprise data warehouse is designed under third normal form, and it is not queried directly by warehouse applications. Data marts, each tailored to the needs of particular business group, are built upon the enterprise data warehouse. These data marts utilize dimensional design and are queried by data warehouse applications.

  •  Dimensional Data Warehouse, AKA Enterprise Bus Architecture
This looks similar to Enterprise Information Factory according to the elements and data flow in the drawing, since they are all focusing on enterprise context and driven by the same enterprise analytic needs, but they are quite different on implementation.
Bus architecture is a better name for this solution because it expressed the implementation principle this architecture follows to build up a data warehouse system. It makes the most detailed data directly available to end users in dimensional form but in a business-process-aligned (rather than departmentally aligned) manner, so that this model can be implemented step-by-step according to business process to satisfy different group of users according to priority and likely to deploy earlier than the use of Inmon's approach.
some of its characters include data warehouse being designed according to principles of dimensional design, accessed directly by analytic systems. Data mart becoming virtual concept and residing inside the data warehouse.    
 In practice, we do not exclusive use one over another. The right technology should be picked to fit the right situation. For example, in Kimball's dimensional data warehouse, TNF tables can take place naturally.

Pictures are from web and book of Star Schema

Saturday, April 11, 2015

Some of Maven

It's a plugin framework.

It has three life cycles: clean, default and site
Each life cycle has multiple phases. for example, clean life cycle has pre-clean, clean and post-clean phases.

Plugins contains certain goals, goals are bound to phases. mvn clean will execute clean phase in clean life cycle, here clean is lean phase defined in clean life cycle. if a clean plugin is introduced and its clean goal is bound to clean phase, then it will be executed. mvn clean:clean invokes clean plugin's clean goal directly.

POM has super pom that is resides in maven's shipped jar. root pom is in root project folder, children pom are local poms defined in sub projects.

Maven's settings.xml is located in installation directory and user's local repository.

Maven coordination is groupid, artifactid and version. you use this to find components in maven repository. each component is treated as a project and thus has such coordination defined.

user_home/.m2/repository is default local repository, you can redefine it with /path/to/local/repo

Repository is where to find dependent plugins and components, as well as where to deploy the release candidates.

is where to find source code to prepare the release used by release plugin. release plugin does not allow uncommitted changes in working folder(I am bothered by this). it  copies the trunk to a tag location and tag/name it as the tag you give during release. it then perform a release from the tagged branch.

repository in distributionManagement is where the artifact are released to.

mvn release:prepare -DautoVersionSubmodules=true -DscmCommentPrefix=xxxx -Darguments="-DskipTests -Dmaven.test.skip" -Dresume=false -DdryRun=true

mvn release:perform actually won't do compilation and testing anymore, so it does not need to skip test.