Node Timeout - bcgov/common-service-showcase GitHub Wiki
Node Timeout
For the most part, when developing Express.js Node applications, you usually do not need to worry about network timeout conditions, as Node.js generally provides reasonable network defaults that cover the majority of expected network traffic scenarios. However, there certain scenarios where timeouts do need to be considered due to the particular network request shapes and expectations.
Why Timeouts are Important
Timeouts are generally important in network traffic scenarios, because both the client and the server must have reasonable expectations for when a certain network request or response should return. In the event these connections either stall out or hang due to unforseen circumstances, servers need to know when they should be culling running network targets. Misconfigured timeouts can lead to servers being vulnerable to Denial of Service attacks such as Slowloris, or yield network behavior that clients do not expect. A general balanced approach to creating reasonable timeout values is important to make sure your application can deliver without choking to potential malicious actions.
Timeout Override Situations
During one of our deeper dives into COMS, we found reports of uploads failing to complete or exhibiting other kinds of unanticipated behavior. After a lengthy evaluation of our entire infrastructure and codebase, we eventually learned that starting with Node v18, they changed the default requestTimeout from 0 to 300000ms (or 5 minutes). While this change is important because it limits the DoS attack potential, it inadvertently broke COMS' ability to accept large file uploads as the application would be severing the upload socket prematurely.
A Timeout on Timeouts
One primary case for needing to consider timeout manipulation is when you need to do significantly large file uploads. Since the largest blob size S3 supports is up to 5TB in size, we needed COMS to be able to support a potentially lengthy request window as network traffic conditions can vary significantly from client to client.
In our case, we wanted to find a way to not timeout any incoming file upload connections, but also respecting the new defaults Node v18 introduced for reasonable security reasons. Much of the various documentation across Stack Overflow and other articles and blogs generally provide examples on how to directly set timeouts. Many examples included things such as server.requestTimeout
and req.setTimeout(value, [callback])
, but many of them also fail to properly describe what aspect of the network connection you want to manipulate.
Not all Timeouts are Identical
Unfortunately, while there is Node documentation about timeouts, it is very easy to become confused about which timeout is the right timeout to be considering. In addition to that, not all timeout values equate to the same thing. Below, we provide a few examples of general timeouts that could be used within express middleware and how that affects the underlying express/http request and response objects:
req.setTimeout(timeout); // Amount of time to wait before terminating a socket
req.socket.timeout
req.socket.Timeout._idleTimeout
req.client.timeout
req.client.Timeout._idleTimeout
req.res.socket.timeout
req.res.socket.Timeout._idleTimeout
req.socket.setTimeout(timeout); // Amount of time to wait before terminating a socket
req.socket.timeout
req.socket.Timeout._idleTimeout
req.client.timeout
req.client.Timeout._idleTimeout
req.res.socket.timeout
req.res.socket.Timeout._idleTimeout
req.socket.server.setTimeout(timeout); // Time to wait for connection establishment before terminating
req.socket.server.timeout
req.socket._server.timeout
req.client.server.timeout
req.client._server.timeout
req.socket.server.requestTimeout = timeout; // Time to wait for a request before terminating
req.socket._server.requestTimeout
req.socket.server.requestTimeout
req.client._server.requestTimeout
req.client.server.requestTimeout
As shown above, while the concept of setting a timeout is relatively straight forwards, where you set the timeout may have a significant impact on what you will actually observe in practice. We found that both req.setTimeout(timeout)
and req.socket.setTimeout(timeout)
were functional aliases of each other, while req.socket.server.setTimeout(timeout)
appeared to affect how long a connection would wait to establish before terminating. Finally, we saw that req.socket.server.requestTimeout = timeout
would affect how long it would wait before a request that was still uploading would take before terminating.
Colluding Factors
Debugging and determining which of these timeouts is appropriately needed is difficult as it is very easy to misattribute a certain behavior to a different parameter. There are many timeouts we have not even looked at, including idleTimeouts, headerTimeouts and keepAliveTimeouts. Luckily we did not need to worry about them too much, but they also have default values which can conditionally affect your expected outcomes while debugging. To keep things brief, we eventually learned the following few nuggets of wisdom.
-
Do not use
0x7FFFFFFF
or max 32bit signed integer as the requestTimeout. When we had requestTimeout set to this value, we encountered a large variation of timeout behaviors, including quick 408 timeouts, connection resets and random connection breakages without responses. While we do not know the root cause of this, we suspect that having such a high integer may be causing unintended overflows and bugs elsewhere in the Node.js code that haven't been properly surfaced yet. -
It must be set to a non-zero value (e.g. 120 seconds) to protect against potential Denial-of-Service attacks in case the server is deployed without a reverse proxy in front.
The Node.js wording above which can be found here is slightly misleading. At first glance, it sounds like requestTimeout does not accept 0 values anymore; however what it actually means is that you should not set the value to 0 unless you have very good reason to do so; it does not prevent you from doing so. Since we were depending on an old Node v16 behavior where this was originally set to 0, we eventually opted to set the requestTimeout value to 0 for only the endpoints that required it to limit our attack surface.
-
Using
setTimeout
with short timeout windows can provide a false sense of security, as they may be killing the network connection for a different reason than what you are actually seeking to remedy. In our case, when we set the timeouts to be short (a few seconds), we were seeing network connections lasting for a few minutes instead, which acts contrary to expectations. While we do not know what is the root cause of this, what we do know is that it is likely that other timeout values may be affecting the connection instead.
Below we have a quick observation of the default values that a standard, unmanipulated express request would have as of Node v18.18.0:
Default Value | Relative Location |
---|---|
0 | req.socket.timeout |
??? | req.socket.Timeout._idleTimeout |
null | req.client.timeout |
??? | req.client.Timeout._idleTimeout |
null | req.res.socket.timeout |
??? | req.res.socket.Timeout._idleTimeout |
0 | req.socket.server.timeout |
0 | req.socket._server.timeout |
0 | req.client.server.timeout |
0 | req.client._server.timeout |
300000 | req.socket._server.requestTimeout |
300000 | req.socket.server.requestTimeout |
300000 | req.client._server.requestTimeout |
300000 | req.client.server.requestTimeout |
What we found when playing with some of the other timeouts is that some of the values would change, but others would not. There was not anything extremely obvious about how things were being changed behind the scenes in the express and http layers. However, the one insight we can offer is that in a normal express connection, the req.socket
, req.client
and req.res.socket
all memory map to the same object. We know this to be the case because when we used Node.js util.inspect()
function to look under the hood, all of those objects had the same memory reference. So when changing your timeouts or any other more advanced values under the hood, make sure to keep that memory mapping reference in mind.
Concluding Thoughts
Timeouts can be a deep and subtle topic to sift through, as general network behaviors can significantly change with just a few minor changes. It can be very easy to end up going down the wrong rabbit hole when investigating these things. In our case, it took a while for us to realize that Node v18 was the source of our troubles; our team spent quite a few days talking to other teams and our infrastructure folks to try and get a handle on the behavior before we could begin to isolate the situation.
We eventually solved our upload issue with a simple one liner in the middleware, but the journey to get to that was definitely not straightforwards. The worst part about all of this, is that debugging timeouts can be temporally expensive, as the only way we would be able to verify if a timeout is truly behaving as intended is to actually wait it out. Even doing that kind of observational testing, it is easy to accidentally misattribute behaviors and outcomes as there are very few signals to really show you whether something worked correctly or not. Our only advice after going through this is to timeout, take a breather, and make sure to slowly and thoroughly investigate everything you are able to do so, as timeouts are never what they seem to be.