- In client-side load balancing, the client somehow obtains a list of possible server it may use, and it implements the smart to decide how to distribute its own requests among the list of servers it has. Round-robin DNS is such an example, where the client receives a list of servers over DNS query. Very little additional infrastructure is needed other than configuration changes.
- In server-side load balancing, the client reaches a server-side load balancer that distributes requests to backend servers. For what the client cares, the load balancer is simply one gigantic server. Some examples are IP Virtual Server and HTTP Reverse Proxy. They require heavier infrastructure investment.
In production, a mix of client and server side load balancing are often used. It's not hard to see why, if we consider the impact of load balancing at the client and at the server. Here we consider just latency and throughput.
Client-side load balancing tends to suffer worse latency than server-side load balancing. The reason is because the client often doesn't have visibility on server availability or load. If a server is overloaded or is offline, then the client may have to wait for a timeout before it tries another server. Propagating server load information to the client does not improve latency because the propagation delay only adds to the overall service time. Server availability is also highly dynamic, so a client cannot reuse this information for an extended period of time. On the other hand, a server-side load balancer is much closer to the backend servers, which means it could obtain server availability much faster. Also, the balancer can amortize the cost of querying server availability over many requests. Server-side load balancing, therefore, incurs no latency penalty to the client.
Client-side load balancing tends to enjoy greater throughput than server-side load balancing. That's because the network path for each client-server path could potentially take different routes. Of course, if a client tries to contact many servers, it would saturate its own uplink first, but if a server's uplink is saturated, we can easily add more servers and more uplinks. On the other hand, a server-side load balancer becomes a unique destination for many clients, so its uplink can be easily saturated. If it is the case that requests are smaller than the responses (which is often the case but not always), then Direct Routing can alleviate some of the load-balancer throughput bottleneck by having the backends return the response directly to the client, but it doesn't scale as easily as client-side load balancing.
The solution often used in production is to deploy both client and server side load balancing. A cluster in server-side load balancing makes a destination highly available, which guarantees the best service time for a given client. But client-side load balancing can send a client to multiple such clusters for better throughput.