Performance Tuning - e2guardian/wiki GitHub Wiki

Overview

This page provides some basic introduction to tuning your E2G Installation. Please bear in mind that things can be pretty complex depending on the setup and there are many pieces to this puzzle, including your server's capabilities and your proxy. Tuning those is beyond the scope of this doc. Also tuning depends on use cases: if you are using an antivirus scanner or downloading large files you may need to further tune other knobs not presented in this guide.

HTTP connections and clients

Before you can do any performance tuning you need to size your network and expected load. The most important aspect is concurrency, ie how many users will be browsing the web at the same time. You don't have to have an exact number, but it's useful to start with a ballpark.

Second, it's important to keep in mind that requesting a website is not one request, but likely several dozens. Think about all the javascript, css and image files, then the ads etc, a modern web page can trigger even hundreds of requests. And if you are using a browser with some kind of prefetching, for example Chrome will try to prefetch the top results from a google search, that number can be even higher.

Third, you should bear in mind that one E2G process (for v4 and later one E2G worker thread) handles one connection at a time. Therefore to serve in parallel all the requests from a single user viewing a single web page can be much higher than you'd expect (ie > 1 :smile:). This is very important when tuning for best performances.

Note: These tuning notes relate to v3 and earlier. Version 4 on-wards uses a single E2G process with multiple threads, which simplifies tuning somewhat.

The configuration variables

In the main config file, e2guardian.conf, there are 6 main variables that control the number of processes e2g will use and that have the biggest impact on performances. These are:

maxchildren
minchildren
minsparechildren
preforkchildren
maxsparechildren
maxagechildren

:warning: Please watch out for comments in the config. These are very old, literally from another era and the values mentioned for large sites is often insufficient for the modern web.

That said the description of what those variables are is pretty accurate, in short:

maxchildren sets the ceiling of how many process you will ever have at any given time. This is at all effects a measure to protect your server's resources from being exhausted by E2G, but also the biggest limit on your concurrency. Set it too low and there won't be enough processes to serve new requests.
minchildren sets the minimum number of children to spawn when your start E2G. What this number is doesn't really matter that much especially if your server is busy and requests will come in immediately after a restart for example
minsparechildren is very important because it sets how many processes to keep around ready to handle connections. This affects how quickly new connections will be served, ie if a process is ready to respond or needs to be spawned.
preforkchildren works in conjunction with minsparechildren to make sure there's processes ready to serve new requests. When there are less than minsparechildren available E2G will start as many new ones as preforkchildren
maxsparechildren is a way to make sure resources are reclaimed. Imagine that a lot of users connect all at once and as a result 100 processes are started, but then they all leave to do something else. That would leave those 100 processes around doing nothing, but taking resources. With this setting E2G will make sure there are never more than maxsparechildren doing nothing. This settings must be higher than minsparechildren or E2G will throw an error.
maxagechildren defines how many request any process will serve before being terminated. This is a good way to avoid runaway processes and memory leaks.

Usecase values

So what values should you set? Obviously that depends on your environment and what was mentioned at the beginning: how many concurrent users and what kind of websites. That said it's useful to provide some use cases so here's one. For a network with about 100 users of which there may be about half browsing the web at any time the settings look like this:

maxchildren = 1000
minchildren = 30
minsparechildren = 20
preforkchildren = 10
maxsparechildren = 35
maxagechildren = 1000

Find your own

E2G has some good statistics info that you can use to figure out what values to set along with a variable setting called logchildprocesshandling. If you set logchildprocesshandling = on, E2G will log to syslog when it spawns or removes processes and more importantly when it hits maxchildren. The stats by default are stored in /var/log/e2guardian/dstats.log. Every 5 minutes, or whatever you configured in dstatinterval, you will see a new line reporting on the following fields:

time - a timestamp of when the stats were logged
children - the number of children running at that time
busy - how many children were busy serving requests
free - how many were doing nothing
wait - how many connections were waiting
births - how many childrens were spawned
deaths - how many were terminated
conx - the number of connections
conx/s - the number of connections per second

Based on all of these you should be able to figure out what to tune and how. Remember of course that the number of available CPU and RAM will be critical to how much concurrent processes you can have running.