Networking rewrite, realtime TCP network support - cryptocode/notes GitHub Wiki
Overview
The Nano realtime network currently runs over UDP. While this enables low communication overhead, there are some downsides. This includes packet loss, weak support for UDP in some OS kernels and devices (such as small buffer sizes) and the fact that some services block UDP traffic.
UDP support in kernels and network devices is likely to vastly improve with the adoption of QUIC through the HTTP/3 deployment (as QUIC is UDP based), but wide deployment is years away.
As a result, TCP support for the realtime network is currently under development. This page tracks status, issues and areas of concern.
Implementation runway
- Consolidate networking code as this is spread across the code base.
- Endpoint abstraction; these are currently udp and tcp specific.
- Client/server types.
- TCP framing.
- TCP discovery and initiation.
- Protocol agnostic connection management.
- Update existing tests and write new ones for new functionality.
During designing and implementation, consider the possibility of new transports in the future, such as QUIC and SCTP.
Consolidate networking code
-
The
socketclass was previously in the bootstrap code. This is lifted into a file for generic networking code (see client/server abstractions) -
The
networkclass is currently sited in node.cpp and should be moved out. -
Accepting TCP connections should be handled by a tcp_server type which delegates to realtime or bootstrap handlers based on message type. bootstrap_listener currently owns this responsibility.
Endpoint abstraction
Status: Mostly done (lacks a few tests; the node compiles and runs)
Currently nano::endpoint and nano::tcp_endpoint are used across the code base. These are just aliases to asio's udp and tcp endpoint types.
To make containers and function signatures work with multiple protocols, a nano::net::socket_addr type is introduced. This is a tagged union that can hold either a udp or tcp endpoint type from asio.
Endpoints must behave correctly with respect to containers; hashing and relational operators are thus reimplemented. Logic is forwarded to the underlying asio type whenever possible.
In an ordered container of mixed protocol entries, tcp will sort before udp for the same endpoint address/port.
Client/server types
Status: Client api mostly done.
The ASIO UDP and TCP sockets have incompatible APIs. For TCP, you follow the async_connect/async_read/write model. UDP is connectionless and the API is a simple async_receive_from/async_send_to model.
In async_receive_from, the remote endpoint is populated through the API. This is important as the buffer class needs to know the sender for allowed_sender filtering.
These differences poses a challenge since you want to able to connect and send to/receive from any endpoint type using the same API.
The proposed solution is to introduce a nano::net::client interface with sub types for tcp and udp (and in the future, possibly other protocols)
The basic API is async_connect, async_read and async_write, along with some convenience functions and overrides.
-
async_connect exists even for UDP. This sets the target endpoint for sending and immediately invokes the callback. For TCP, this forwards to asio's async_connect.
-
async_read reads from the socket. In addition to error code and bytes read, the callback receives the remote endpoint. This is important for UDP since the remote endpoint isn't known a priori.
-
async_write forwards to the corresponding function for TCP, and to async_send_to for UDP.
Along with convenience functions for obtaining local/remote endpoints, etc, this provides a uniform TCP/UDP client interface.
TCP framing
Challenges/factoids/trivia:
- The header does not have room for size; if it had, no framing would be needed. We can potentially use some header-extension bits to indicate payload sizes at the cost of reducing available flags. For instance, with a 3-bit exponent, we could do 2^n*4 for sizes 4,8,16..512. However...
- Some UDP messages have "interesting" design features, such as VBH depending on EOF instead of a count field for the number of hash values. This calls for a precise payload size preamble.
- Another option is to use the max version field. If version-using >= 0x11, reinterpret max-version + 1 extension bit as a 9-bit size field.
If framing is needed, several options exists. Basically the udp messages needs to be wrapped inside an envelope. Alternatives include:
- A simple magic+length envelope.
- Repeated headers. Here we'd have the common header as the envelope, but with a special "envelope" message type. If the message type is 0xff ("envelope"), then the max-version means number of messages-1 and the extension bits are the payload length. The message payload is the actual message with the original header intact. This would keep the protocol uniform and simplify stuff like dissectors which wants to group traffic by common preamble patterns.
Potential feature: the repeated header-option would allow for 65 kb worth of realtime messages per envelope, allowing us to pack up to 256 messages (as indicated by the retrofitted max-version field)
TCP discovery and initiation
Since UDP and TCP support will co-exist (at least as TCP support is being introduced), the topic of when and to which peers TCP connections are made must be addressed.
Connection upgrade
If we keep the current preconfigured-peers discovery model, we must start with UDP. As incoming messages and inbound bootstrap connections occur, we build knowledge of which peers support TCP (through the using-version field)
Connection downgrade
If TCP connections to a peer fail (such as being blocked by the firewall), we must downgrade to UDP. However, this should probably not be done if a TCP connection has recently succeeded.
Open questions
- Are some services closing the TCP port to avoid inbound bootstrap requests? If so, they can instead use
--disable_bootstrap_listenerto deny bootstrap requests. We would have to rework the code to still accept TCP connections, but ignore bootstrap messages. We might want to consider making this a config flag instead of a CLI option.