Cheap and Secure Web Hosting Provider : See Now

[Solved]: Why does appending permutations of servers at the end of hash table avoid bottlenecks?

, , No Comments
Problem Detail: 

I was reading the following FDS paper:

The paper has a TLT (tract locator table) for identifying where to write in each server. The table is basically a hash table where the index i maps to some (tract) server to do writes to or reads to.

Say that there are n (tract) servers. It says that if instead of just having index i mapping to server i, if instead we appended n more indexes mapping to some permutation of the servers (and having a table of size 2n in this example), we can avoid having a bottleneck at any specific server when doing reads or writes. Basically it claims that as we append more permutations the better. Why is this?

Why would this be?

Just for reference the index mapping function is:

$$i = (hash(g) + t) \pmod n$$


g = is the global UID for a blob

t = tract number. (tract is the measure/units of reads and writes to a blob).

n = is the number of tract servers or a multiple of the tract servers.

hash = SHA-1 (according to the paper)

Paper Reference:

Title: Flat Datacenter Storage

Author(s): Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, Yutaka Suzue

Institution(s): Microsoft Research, Univeristy of Texas at Austin

Asked By : Pinocchio

Answered By : Kyle Jones

Adding permutations isn't about preventing slow servers from becoming bottlenecks, rather it's about dispersing a convoy once one forms behind a slow server. Because of the way tract locations are hashed, sequential reads of any blob always walk the tract locator table in the same order. Suppose you have six tract servers and your tract locator table looks like this


and server 3 is slower than the others. Since sequential readers walk the table in the same order the slowness of server 3 will cause a convoy of sequential clients to form waiting for server 3. Once server 3 serves them, they all move on server 4 which may also become overloaded. Then 5, then 6, and before long the convoy is big enough to hammer each server in turn instead of spreading the load evenly across all of them.

If your tract locator table looks like this


i.e. four permutations of 123456, events transpire differently if there is a slow server. If server 3 is slow, satisfied sequential readers will move from server 3 to servers 5, 1 or 6 depending on where they are in the table. This instead of all sequential I/O clients turning their attention to the same server after visiting server 3. The extra permutations have the effect of breaking up convoys if they form and lessening the likelihood of convoys forming at all.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback