from Hacker News

When Simple Wins: Power of 2 Load Balancing

by mattdennewitz on 6/26/17, 10:11 PM with 45 comments

  • by alxv on 6/26/17, 11:12 PM

    The method is called "Power of Two Random Choices" (http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...). And the two-choices paradigm is widely applicable beyond load balancing. In particular, it applies to hash table design (e.g. cuckoo hashing) and cache eviction schemes (https://danluu.com/2choices-eviction/).
  • by zubspace on 6/27/17, 7:58 AM

    I'm not an expert in this field, but an engineer of vimeo went into detail, why this approach did not work for them. [1]

    Problem with consistent hashing:

      However, consistent hashing comes with its own problem: uneven distribution of requests.
      Because of its mathematical properties, consistent hashing only balances loads about as
      well as choosing a random server for each request, when the distribution of requests is
      equal. But if some content is much more popular than others (as usual for the internet),
      it can be worse than that.
    
    Problem with Power of 2 Load Balancing:

      Why wasn’t there a way to say “use consistent hashing, but please don’t overload any
      servers”? As early as August 2015, I had tried to come up with an algorithm based on
      the power of two random choices that would do just that, but a bit of  simulation said
      that it didn’t work. Too many requests were sent to non-ideal servers to be worthwhile.
    
    Instead, he used something called Consistent Hashing with Bounded Loads.

    [1] https://medium.com/vimeo-engineering-blog/improving-load-bal...

  • by throwaway13337 on 6/26/17, 10:48 PM

    The simplest load balancing I've done is modulo the user ID by the number of servers then point at that server.

    This solves caching too since you are only ever receiving and caching user data on a single server. No cache communication required. You can enforce it on the server side for security as well.

    Doesn't require a load balance server - just an extra line of code.

    Keep it simple.

  • by euph0ria on 6/26/17, 10:43 PM

    Regarding the math section, could someone please describe it like you were talking to a 5 year old?

    1) Θ( log n = log / log n )

    2) Θ(log log n)

  • by gopalv on 6/26/17, 11:03 PM

    "Power of 2 Random Choices" ... has nothing to do with the "Power of 2" directly.

    I like 2Choice because it is not dependent on hash function design & is temporal, but I have a positive aversion to the 2^n hash distributions when it comes to data, specifically for distributed systems which need to flex up/down [1].

    [1] - http://notmysock.org/blog/hacks/1440

  • by scame on 6/27/17, 9:54 AM

    I've seen a paper doing the same thing directly at the network layer using IPv6 extension headers: http://www.thomasclausen.net/wp-content/uploads/2017/06/2017...
  • by adrianratnapala on 6/27/17, 3:06 AM

    Can someone expand on the maths that the OP elided? What is the thing that comes out to O(log n / log log n)?