Tuesday, January 13, 2015

Read a Thin Book - Cloud Architecture Patterns - 3 - Database Sharding

Database Sharding


It is a technique to scale data horizontally.

Impress on this is to start with single database and then divvy up its data across multiple ones (shards). Each one has same schema as original one. most data is distributed so that each row appears in exactly one shard. The combined data from all shards form single logical database.

When to use


-- when query volume, update volume and storage requirements etc exceed capacity of single database
--when you can no longer scale up or scaling up involves unacceptable cost comparing to scaling out.

Implementation


In database integrated sharding support, such as federation in MySQL and SQL Server.
This pushes complexity out of application code and into the database service.

Application layer support. expensive and complex, but more flexible. If you want to shard Oracle database, likely you will have to go through this approach.

Usually specific database column is designed as shard key that determines which shard node data should goes to. the shard key is also needed in order to access data.

Even sharding usually means horizontal split and stores different set of same structured data into different nodes, functional split or vertical split is very often the first step to go before data can be further sharded (horizontal split). Each function area has a shard group, in each shard group, data is horizontally split. For example, for a function area dedicated for customers, a shard group is created for it, with the group, customer data is split and distributed according to last name.

Well designed sharding may only shard certain tables, commonly used reference tables can simply be replicated entirely.

In order to be efficient, single shard should be able to satisfy most of common database operations without having to resort to another shard. (share nothing)

As a common practice, all the shard nodes should have UTC as their time,

Related Concept or Areas


Scalability, eventual consistency, shared nothing/autonomous

Challenges


Aggregation usually for report usually is a challenge and has to resort to application, NoSQL database such as CouchBase support MapResuce that span shards, but not common to most of databases.

Adding nodes and design shard key is also a challenge. poorly designed sharding plan may introduce huge volume of data movement.

Materials

http://www.google.com/url?q=http%3A%2F%2Fintermediatesql.com%2Fsharding%2Fioug-collaborate-2012-the-art-of-database-sharding%2F&sa=D&sntz=1&usg=AFQjCNFxJ6PTrdYTsV1Q-TRieXl4K6z-7Q



No comments: