10.0_xtreme 10.0 - dog walking-CSDN博客

Topic 10.0

1. Scalability question can be among the easiest questions.

2. System design problems: just use the following Step-By-Step Approach

1) Make Believe: Pretend that the data can all fit on one machine and no memory limitations.

2) Get Real:

-->Think about how much data can fit on one machine, and what problems will occur when split the data up?

-->Figure out how to logically divide the data up, and how one machine would identify where to look up a different piece of data.

3) Solve Problmes: Usually can continue the approach in Setp 1), but may fundamentally alter the whole approach occasionally. Iterative approach is typically useful.

-->Demonstrating you can analyze and solve problems is enough, you do not need to re-architect a complex system that companies have spent millions of dollars building.

3. Though we can sometimes increase hard drive space in a computer, ther coms a point where data simply must be divided up across machines. What data belongs on which machine, strategies:

1) By Order of Appearance:

-->As new data comes in, wait for current machine to fill up before adding a new machine.

-->Good: never using more machines than are necessary.

-->Bad: lookup table may be very complex and potentially very large.

2) By Hash Value:

-->Store the data on the machine corresponding to the hash value of the data. A. pick some key relating to the data. B. hash the key. C. mod the hash value by the number of machines D. store the data on the machine with that value à#[mod(hash(key), N)]

-->Good: No need for a lookup table. Every machine will know where a piece of data is.

-->Bad: a machine may get more data and exceed its capacity. We have to shift data around the other machines (expensive); Or split the machine’s data into two machines, causing a tree-like structure of machines.

3) By Actual Value:

-->there is no relationship between what the data represents and which machine stores the data. Example: when designing a social network, can store “similar” data on the same machine so that looking up the Mexican person’s friends requires fewer machine hops.

4) Arbitrarily:

-->frequently, data gets arbitrarily broken up and we implement a lookup table to identify which machine holds a piece of data. While this does necessitate a potentially large lookup table, it simplifies some aspects of system design and can enable better load balancing.