Case Study::why batching requests is faster than making many individual requests. - KeynesYouDigIt/Knowledge GitHub Wiki

Vince comments This case is studied in node, but appears to apply to any system that allocates any resources on a per-request basis

TLDR highlight - So by batching requests, we are allowing them adequate time to complete before allocating new resources for the next group. Which in the latter case, can drastically speed up execution times, and in the former case, can prevent our process from being killed part way through it's execution.

Whole comment -

Mikey Sleevi

There are many records that need to move from a NoSQL instance into a SQL instance. I am going to ignore the database components of this transaction, although I suspect this would have gone much faster with a few UPDATEs handling 100K records at a time. We are going to talk about why it is a problem to have too many Promises at once. I will try to reference as much material as I can going through this, but if you need further clarification just ask. We have to start with a bit of background here, just so we are all on the same playing field. Node.js, at it's core, is a combination of a few different things. The two main things we are going to talk about are the engine (in this case, V8) and the event loop (in this case, libuv). In general, most of the stuff you read about the event loop is wrong. The event loop has 7 phases (not 4!), however 2 of them are ignored and one is more of a background process in Node.js. Those phases are (ref: https://github.com/libuv/libuv/blob/v1.x/src/unix/core.c#L369-426)...

   ┌───────────────────────────┐
┌─>│           timers          │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │     pending callbacks     │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │       idle, prepare       │
│  └─────────────┬─────────────┘      ┌───────────────┐
│  ┌─────────────┴─────────────┐      │   incoming:   │
│  │           poll            │<─────┤  connections, │
│  └─────────────┬─────────────┘      │   data, etc.  │
│  ┌─────────────┴─────────────┐      └───────────────┘
│  │           check           │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
└──┤      close callbacks      │
   └───────────────────────────┘

However, Node.js will "ignore" Idle & Prepare and Pending is used for handling some callbacks that aren't scheduled by the programmer (like ECONNREFUSED). Those phases can be more or less ignored, the remaining phases and there uses are (ref: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/):

So, while we talked a bit about the background of libuv and the Node.js specific implementation of the event loop produced by the library. We haven't gotten to the V8 side of things. Which encompasses the other half of our background. In order to dive into the V8 side of things, we need to talk about an important concept in Node.js called a Promise .

Node.js was not designed to support Promises from the beginning (added 6 years after node was first intro'd), and in fact, libuv has no notion of a Promise and instead still relies on good ol' callbacks. So, they needed a way to execute these Promises. And thus, they made their way into something called the "Promise Microtask Queue" in V8 (ref: https://github.com/v8/v8/blob/6.8.275.24/src/builtins/builtins-internal-gen.cc#L878, as a note here, if you unfamiliar with C or C++, there is a lot of "template" or "metatemplate" programming that occurs, this is more or less generating code from other code). From here, we have to get a bit specific. This next section is only going to apply to versions of Node.js 11 or higher, if we want to talk about what's changed here is a reference we can start with: https://github.com/nodejs/node/issues/22257.

There are two major Microtask queues, the one for nextTick() and for Promise. We are mostly interested in the Promise queue, but everything applies to the nextTick() queue as well.

These queues are important because they execute outside the context of the normal event loop. The execution for the queues (as of v11.0) occurs partway through the execution of what are called Macrotasks (or Phases). The way these execute is that everything on the queue will be executed before moving onto the rest of the phase. Which brings us to one of the first was that batching helps. One fun thing that can happen with the microtask queues is that we can actually prevent the process from moving forward by adding too many callbacks into a microtask queue. So this is potentially one way that batching could potential help some performance. If we batch callbacks on the transaction queue, we can prevent ourselves from getting "stuck" executing all the callbacks in the queues. However, I will say this does not happen often and usually only happens with the use nextTick(), hence the Node.js advisory here: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/#process-nexttick How do we debug this? Check out the Node.js metrics around Async Resources (see dd-trace for details here: https://github.com/DataDog/dd-trace-js) Check out the Node.js metrics around Tick Time (see dd-trace for details here: https://github.com/DataDog/dd-trace-js) Timers: setTimeout() and setInterval() I/O Poll: retrieves new I/O events and executes I/O related callbacks. Check: setImmediate() Close: closing callbacks (e.x. socket.on('close')) I won't dive in too much more here, since we are keeping this away from things like epoll , fds and other OS specific semantics. When you complete all seven of these phases, you will have executed what Node.js calls a tick (this is a drastic oversimplification, but this definition works for most cases. For those interested in the actual definition, tick is the process between socket receives by a NativeAPI that requires synchronization with a Javascript function execution.) There are a lot of metrics around the Node.js VM, but one of the more important ones is tick execution time. This a measurement of how long it takes to complete a cycle and if that value is high, then that means there is work that is "blocking the event loop". I think we all know that "blocking the event loop" is bad and "things need to be async". We are going to talk about what it means to block the event loop with too much async work though, which might be counter intuitive to some. This can manifest in some particularly fun ways, one of which is an Out of Memory error. Edited

So, if that's unlikely, what is the likely scenario here? As I mentioned earlier, there are a ton of ways that too much async work can cause problematic behaviors. Without diving too much into why, here are some things that can happen... Too many file handles open Too many sockets open (a manifestation of the above) Out of Memory errors High level of "swapiness" and certainly more I am not thinking of For the most part, the first two are unlikely as well, especially in a network context. Although they have been known to happen, you will usually run into the latter half first.

Which brings us to the last part of V8 we need to discuss, Orinco. Orinco is the garbage collector in V8. I definitely don't have the time to discuss Orinco (or memory management in general). But just know that garbage collectors are responsible for determining when memory is unused and when to free the memory for other uses.

We do have to talk about a few aspects of Orinco though at a high level. Orinco has two important heap spaces, Young and Old (and the Young is split into a Nursery and Intermediate, but that is less important). Objects that are long lived are obviously in Old and shorter lived objects are usually in Young. But how do we decide what's in Old and what's in Young? The really short explanation is that if an object in Young survives multiple garbage collections, it is promoted to Old. And objects survive garbage collections by maintaining references to execution contexts in the program (like a Promise). ref: https://v8.dev/blog/orinoco-parallel-scavenger

An important part of the discussion is answering when a garbage collection runs, and unfortunately you can't really ever be sure. At least it isn't predictable. Here is a point where you are just going to have to take my word on something. Garbage collection is happening all the time while your program runs. Orinco in particular has a hybrid approach, using Parallel Sweeps, Concurrent Marking, Incremental Compaction and Evacuation (if these terms don't make sense, just read "stuff is happening while the program is running"). ref: https://v8.dev/blog/trash-talk So why is garbage collection so important to "Out of Memory" errors? Well, this has to do with closures and reference counting. We talked about how too many Promises or nextTick() calls can block the Macrotask queue (and therein the event loop), however, if we don't get enough Promises or nextTick() calls that are producing callbacks to add to the microtask queue, then that reference persists. And as that reference persists, these Promises move from Young space to Old space, where they become harder to reclaim. As we start to add millions of objects, these spaces grow to unmanageable levels, which results in the two scenarios above, for recap:

  • Out of Memory Errors

  • High level of "swapiness"

Out of Memory errors are the easiest to explain. Node.js will "provision" a certain amount of memory as the application starts. This is usually 1/4 the maximum available memory (numbers change depending on a lot of things). As the process runs, the heap spaces will grow and shrink, but if they continue to grow without subsequently shrinking this is what is called a "memory leak" (this is an oversimplification of this term, this tends to be much more specific). From here, there is usually a maximum limit that the process can grow to before it experiences the OOM Killer (ref: https://www.kernel.org/doc/gorman/html/understand/understand016.html) or it terminates itself with an Out of Memory error. In this case, this is because the Promise reference haven't scheduled their callbacks and we have been allocating them faster than they can finish.

The next is a High level of "swapiness". This is a little bit hard to explain without diving into swap as a concept. The general idea here is that there is a special part of your disk called swap. This special disk space allows you to evict pages of memory to disk (for slower retrieval) in order to prevent an Out of Memory error and process termination. In practice, not many places tend to use swap for production systems (especially in containerized environments).

So by batching requests, we are allowing them adequate time to complete before allocating new resources for the next group. Which in the latter case, can drastically speed up execution times, and in the former case, can prevent our process from being killed part way through it's execution.

How do we debug this? Check out metrics around Node.js heap allocations (see dd-trace for details: https://github.com/DataDog/dd-trace-js) Check out general instance memory usage and process memory usage metrics (tools like atop are great for local testing, for more in-depth heap analysis use Chrome DevTools)

Just to reiterate, this is likely not the case in the previous example as there are a whole host of things to consider when we get into network and filesystem interactions (which we intentionally ignored). But hopefully this was helpful in understanding how Node.js (and V8) work and some checks you can have to understand the performance characteristics of your system.