The primary functionality of a load balancer is to distribute the load of the incoming traffic amongst a set of backend worker nodes. This set of worker nodes can be either statically configured or can be dynamically discovered. Traditionally, load balancers are configured with a set of static nodes. This means, new nodes outside this set cannot be added at runtime. Dynamic load balancers support addition & removal of worker nodes at runtime, and the IP addresses & other connection details of the backend nodes need not be known in advance by the load balancer. The load balancing algorithm is a central part of a load balancer. This algorithm specifies the load balancing policy, or how the load has to be distributed across multiple backend worker nodes. Generally, all worker nodes have identical hardware & software configuration, as well as host identical copies of deployment artifacts. Hence the round-robin load balancing algorithm is very suitable & widely used for such deployments.
Most modern load balancers support session affinity. This means, if the client sends a session ID, the load balancer will forward all requests containing a particular session ID, to the same backend worker node, irrespective of the specified load balancing algorithm. This may look like defeating the purpose of load balancing. But before the session is created, the request will first be dispatched to the worker node which is next in-line will receive the request, and a session will be established with that worker node. We also have to keep in mind that stateful applications inherently do not scale well, and state replication can have huge overheads, so it is best to minimize server side state if you want your application to be massively scalable. So, session-affinity based load balancing is a compromise solution to the problem of deploying stateful applications in clusters.
Elastic Load Balancer
An Elastic Load Balancer (ELB), in addition to carrying out its functionality in load balancing, is also responsible for monitoring the load & starting up new worker nodes or terminating existing worker nodes, depending on the load. This behavior of scaling up the system while the load increases & scaling down the system when the load decreases is known as autoscaling.
In a typical architecture, load balancing & autoscaling will be handled by two logically distinct components. It may even be possible to deploy the load balancer component & the autoscaler component separately.
Cloud-nativity & Load Balancing
Load balancing is the key to Cloud-based deployment architectures. The Elastic Load Balancer is an essential component in the deployment architecture, when it comes to realizing the Cloud native attributes of multi-tenancy, elasticity, distributed & dynamic wiring, and incremental deployment & testability.
Fronting Multiple Clusters - Service-aware Load Balancing
When it comes to production deployments, a load balancer does not do much of the real work. The real work is done in the backend worker nodes. So, having load balancers introduces additional cost. Since load balancers do not do much of the real work, typically, the load on the load balancer itself should be very small. Hence, a load balancer is generally capable of fronting quite a large number of backend worker nodes. In a traditional deployment, one LB may front a cluster of homogenous worker nodes. However, a load balancer is generally capable of handling multiple clusters. The important thing to note is, the traffic has to be routed to the correct cluster, and the load has to be balanced according to the specified load balancing algorithm specified for that cluster. A cluster of homogeneous worker nodes is called a Cloud Service, in Cloud deployments. So, a load balancer which fronts multiple Cloud Services is typically called a Service-aware load balancer.
Multi-tenancy - Tenant-aware Load Balancing
If a Cloud deployment has to be able to scale to thousands, hundreds of thousands or millions of tenants, we need tenant-partitioning. This means, for a single Cloud service, there will be multiple clusters, and each of these Service clusters will handle a subset of the tenants in the system. Creating dynamic tenant clusters & tenant partitioning strategies are some of the ongoing research areas. In such a tenant-partitioned deployment, the load balancers themselves need to be tenant-aware, in order to be able to route the requests to the proper tenant clusters. This means the load balancer has to be tenant-aware as well as Service-aware, since it is the Service clusters that are partitioned according to the tenants.
Single Point of Failure?
The load balancer itself can become a single point of failure, defeating the purpose of having clustered deployments. This can be handled by having the LBs deployed in pairs in either hot-hot or hot-cold configuration. If the LBs are deployed in hot-hot configuration, we could do DNS round-robin between these LBs. In hot-cold setups, if the primary LB fails, we could do an IP remapping.
WSO2 Elastic Load Balancer
The WSO2 Load Balancer (LB) is a load balancer based on Apache Synapse & WSO2 Carbon. This is also an Elastic Load Balancer, and has been deployed on StratosLive, the Platform-as-a-Service from WSO2. It is also available for download as part of the WSO2 Stratos Cloud Middleware Platform. At present, the WSO2 Elastic Load Balancer (ELB) is only Service-aware, and can be deployed as a load balancer cluster.