Load Balancing - rFronteddu/general_wiki GitHub Wiki

Load balancing is the process of spreading requests across multiple resources according to some metric (random, round-robin, random with weighting for machine capacity, etc) and their current status (available for requests, not responding, elevated error rate, etc).

Load needs to be balanced between user requests and your web servers, but must also be balanced at every stage to achieve full scalability and redundancy for your system. A moderately large system may balance load at three layers:

user to your web servers,
web servers to an internal platform layer,
internal platform layer to your database.
There are a number of ways to implement load balancing.

Smart LBs provide several benefits like:

predictive analytics that determine traffic bottlenecks before they happen.
They give actionable insights to drive automation and business decisions.
LB can be can organized in clusters, avoiding a single point of failure. LBs can then monitor each other and take over if necessary.

Types of LBs

Smart Clients

Adding load-balancing functionality into your database (cache, service, etc.) client is a common pitfall, writing a client that takes a pool of service hosts and balances load across them, detects downed hosts, and avoids sending requests their way may be seductive but it's hard to get right.

Hardware LB

Most expensive but highest performant with specialized HW, generally only used as the first point of contact from user requests to their infrastructure.

Software Load Balancers

Software LBs such as HAProxy run locally on the box containing each service, manage healthchecks and will remove and return machines to those pools according to your configuration, as well as balancing across all the machines in those pools.

For most systems, you start with a software LB and move to smart clients or HW LB only with deliberated need.

LB Algorithms

Load balancers consider 2 factors before forwarding a request to a backend server:

the server has to be responding appropriately to requests (use of periodic health checks)
use an algorithm to select between a set of healthy servers.
- Least Connected: Direct to the server with the fewest active connections.
  - Works well when there are large number of persistent client connections unevenly distributed between the servers.
- Least Response Time: Direct to server with the fewest active connections and the lowest average response time
- Least Bandwidth: Select the server currently serving the least amount of traffic.
- Round Robin: Cycle through a list and send each new request to the next server.
  - Most useful when the servers are of equal specification and there are not many persistent connections.
- Weighted Round Robin: Server with higher weights receive and maintain more connections than those with less weights
- IP Hash Use an hash of the IP of the client to decide where to redirect.