Friday, February 26, 2010

How Generational Garbage Collector chokes on high request rate

Who doesn't want their server to be able to handle a large amount of requests gracefully? So you studied the programming languages shootout and picked one that performs the best—but not C/C++ because you don't want to bother with memory management—then wrote your server application, carefully multi-threaded your code, and tuned the code to your satisfaction. Only you find that the more concurrent requests you handle, the more time it takes to service each request. You profiled the running time and found that your server spends more time garbage collecting when the number of concurrent requests is higher.

Generational garbage collector assumes that most objects die young, so they would perform quick, minor collection on a nursery (also called "young" or "new object" heap). The assumption is that the nursery would contain few live objects, so the collection would be quick. The survivors are moved to the tenure generation (also called "old" heap), which is garbage collected less frequently. When the tenure heap runs out, the garbage collector engages a full collection. This heuristic addresses the problem that garbage collectors spend a lot of time traversing and moving reachable objects in order to get rid of the abandoned ones.

The lifetime of an object is measured by the number of bytes being allocated, not by the number of elapsed CPU cycles. A minor collection is triggered when the nursery is full. If the nursery is too small, the probability that a short-lived object gets promoted to tenure status increases. This increases the chance that we need a full collection later. We want the nursery to be large enough to give most short-lived objects a chance to die.

The "most objects die young" heuristic works well for single threaded programs. However, in a server application, when the server is handling multiple concurrent requests, the rate of allocation also increases proportionally—by virtue that multiple threads are allocating concurrently, making the object lifetime as seen by one thread of execution seem longer relative to the amount of objects the nursery could hold. The more concurrent requests your server handles, the more short-lived objects make it to tenure, so the tenure heap runs out more quickly. Then the garbage collector must spend more time doing a full collection.

One solution is to increase the nursery size. If a 4MB nursery is sufficient for one request, you should increase the nursery size to 400MB if you want to handle 100 concurrent requests. The tenure generation should not be increased—its size should be proportional to the amount of live objects you plan to keep in memory throughout the lifetime of the server, not to the number of concurrent requests. If your garbage collector gives you the option to specify the size of a per-thread nursery, that would be ideal.

Another solution is to limit the number of concurrent requests per server process. If you want better concurrency, you can always spawn new server processes. To one extreme, you can assign one worker process per request. The processes can all run on different processors, but each process is single threaded. When the request is finished, the process would make itself available for the next request (without quitting and restarting). A middle ground is to have several processors run a low number (< 20) of concurrent threads.

Further reading:

No comments: