Tuesday, August 26, 2014

Scaling Out

For big web systems, scaling is requirement #0.It's part of requirements even it is also architectural concern which is defined as non-functional requirement conventionally.

machine can always be scaled up, but real scalability comes from being able to  scale out.

Key: partition by function and data. asynchronous everywhere.

Best practices summarized from Randy Shoup's good article finding from here: http://www.infoq.com/articles/ebay-scalability-best-practices

partition by function. such as partition by bidding and order tracking.

split horizontally. within same function, break the workload down to manageable units. for example, as logon servers, one server is dedicated for user id between 1 and 10000, second is dedicated for 10001 and 20000 etc.

SOA principle and make services stateless. this makes it easy to scale out since services are equal to each other. it'd be easier to set up load balancer to distribute the work load among them.

Decouple functions asynchronously. A depends on B implies scaling A has to also scaling B, Failure on B implies also failure on A. Methods to use includes queue, batch process such as ETL, multi-cast messaging or other means to relay or staging the request. After decoupling, A can move forward even B is down.

This principle can also be used inside software itself. At every level, decomposing the processing into stages or phases, and connecting them up asynchronously, is critical to scaling.

Anything that can wait should wait. some back end processes should not be part of front end process in order to provide responsive experience to end user. such as transactions in bank system typically be processed in the night. user can quickly submit their requests, but they can endure the latency of waiting couple days for the transactions being synchronized everywhere in the banking system.

Abstraction everywhere. such as defining simple and long lasting interfaces between components. parameters are typically defined as key -value pairs or XML to support extensibility.

"
The virtual machine in many modern languages abstracts the operating system. Object-relational mapping layers abstract the database. Load-balancers and virtual IPs abstract network endpoints. As we scale our infrastructure through partitioning by function and data, an additional level of virtualization of those partitions becomes critical.
"

Using cache appropriately.
not distributed transactions.
databases are put in shards to server different  functions.

No comments: