requirements - nsip/n3 GitHub Wiki

System Requirements

N3 is distributed as a set of self-contained binaries with no external dependencies.

Binary packages are compiled and available for Mac, Linux and Windows 64-bit systems. Download the approapriate binary bundle for your platform here.

N3 is designed to run on low-capacity systems, the two main binaries that form an n3 system are the n3w server, and the nats-streaming-server messaging service over which n3 nodes communicate.

TLDR

In short, n3 services are designed to run efficiently on moderate resources, accross all our testing including heavy loads (naplan data), a quad-core machine with 8Gb of RAM such as an a1.xlarge amazon instance is more than adequate.

n3 is a data-processing environment, and performance will improve with both more cores and more system RAM avaialable.

n3 does store data to disk, and so writeable directories that can store as much data as is sent through the system need to be available.

The system requirements for each component are:

nats-streaming-server

nats-streaming-server connects n3 nodes, and connects the publishing process of n3w to the data-storage mechanism.

nats-streaming-server persists messages in in ordered channels (like kafka). Clients can connect to a stream and consume messages, even if the messages were created/published by another consumer or process at an earlier time.

n3 uses the durable-queues feature of nats-streaming so that when a client re-connects to a channel to consume messages it will begin consuming from the last message it had reached.

nats-streaming-server requires little in the way of resources when running, especially when configured to use a file-backed memory store.

By default when starting nats-streaming from the command-line, it will default to using a memory-backed store for messages; that is to say that all messages for the various streams are kept in system memory.

Messages are compresssed by default, but over time if running in memory-backed mode the memory use of the nats-streaming-server will increase as the number of messages received by the server increases.

Messages are purged from the system according to system limits set in the configuration of the server, for example messages older than n hours will be dropped in order. For more deatils on the various configuration options for nats-streaming-server see the configuraiton docs here.

The typical overhead of the nats-streaming-server outwith the data storage overhead is between 6 and 20Mb in terms of required memory.

The preferred configuration for nats-streaming in production is to use a file-backed message store.

To establish this run nats-streaming-server with a configuraiton file. A suitable starting-point config is provided in this main repo as nss.cfg to invoke the server using this config copy the confgi file to the same directory as the stremaing server binary, and launch the nats-streaming-server passing the config on the command-line:

n3dir $> ./nats-streaming-server -c nss.cfg

This sample config sets the storage mode to file in an /n3 folder, when running in this mode the memory requirements of the streaming server are very low, typically less than 10Mb as message data is not stored in memory.

The server does need access to writeable folders/directories, of course, where data will be stored for each channel configured. All message data is compressed, and is purged according to the limits set for the server globally or according to limit set per channel. Assume that there needs to be enough writeable space to hold around 50% of the message volume being transmitted.

NOTES:

There is no requirement that the nats-streaming-server must be run in the n3 distribution folder.

You can install the server in any suitable location on your target file-system, and the config file can be anywhere you choose, just adjust the command-line parameter accordingly.

There is actually no requirement to use the instance of nats-streaming-server provided as part of the n3 binary distribution, it's just bundled for convenience.

On Mac and Linux platforms the streaming server can be installed using your local package-manager of choice (homebrew, apt etc.), these package managers will also offer the facility to install the streaming-server as a service using the chosen mechanism of your operaitng system.

nats-streaming-server can also be installed as a service under Windows by following the instructions here.

n3w

n3w (nias3 web-server) is the main access point for users of an n3 system.

The application is launched from a console/terminal as a single binary:

n3dir $> ./n3w

The n3w server then presents web interfaces for the core functions of an n3 system:

http://localhost:1323/admin/newdemocontext - for creating contexts
http://localhost:1323/publish - to add data to a context
http://localhost:1323/graphql - GraphQL query endpoint to query data in a context

for more detailed information on these endpoints see the 'importing data into n3' guide in this wiki.

Clearly, as a web service, port configuration on the chosen host must allow access to these endpoints. Also as noted above the n3w server will need access to tcp port 4222 in order to connect to the nats-streaming-server. If this connection is not avialable the n3w server will issue a warning message and terminate.

In terms of resource requirements, the n3 web-server itself requires very liitle, around 2Mb of RAM when idle.

n3w will spike in terms of RAM when large datasets are published to the node. The nature of the golang garbage collector (GC) is that the application will consume as much RAM as the underlying OS is prepared to make available. Memory will be reclaimed as the GC runs over time, but only if there is sufficient memory pressure from the OS to need to do so.

If for example the system has 8Gb of RAM, and there are multiple publish and ingest processes taking place, the n3w server will use as much RAM as the system can provide to do as much work (query results caching, for instance) in memory, and so memory usage can spike to 1 or 2 GB.

If the system has significant memory pressure this will be aggressively reclaimed by the GC, but if no such memory pressure is present the resource load will remain high until multiple GC cycles have run.

Over time this will tyically mean that resource usage drops back to less then 100Mb of RAM in use, but the timing is depedent on the overall load of the system.

Data published into the n3 system will be stored by the n3w node. This means that the n3w server does require access to writeable disks that can hold the volume of data that has been sent to a context.

Contexts are an arbitrary partition of data created by the user based on business need; such as a partition per school, a partition by class-group, a partition by data type etc.

For more details on contexts and their layout on disk, see the section in this wiki on 'importing data into n3'.

Overall as in the tldr section, an a1.xlarge type instance with 4 processors and 8Gb or RAM should be more than sufficient to support multiple school-size contexts with no performance issues for ingesting data or running performant queries.