Optimizing Server Load - RickStrahl/WestWindWebSurge GitHub Wiki

This topic discusses some strategies to get the most performance out of the West Wind WebSurge Load tester when running very high load tests of requests that take 20-30,000 req/sec or more. These type of test scenarios behave a bit different than slower requests that tend to wait on IO operations to complete.

More Threads do not always equal Higher Load

It's important to understand that when you are creating massive load on the server in order to find application breaking points, you are essentially trying to find the sweet spot where you can send the most possible number of requests to the server. If you're running against a very high volume service that takes 10's of thousands of requests a second, you will actually start pushing the limits of the stress testing application as the request processing for each request actually ends up overloading the CPU of the test machine.

Essentially what happens in these scenarios is that the testing process itself becomes CPU bound and adding more threads often doesn't improve CPU bound operations.

For high volume testing the most optimal number of simultaneous request threads for an application is going to be the same number of threads as CPU cores on the machine. So on my I7 laptop with 8 cores, running 8 threads will often yield the very highest request count I can push into a server.

The longer request take however - the more the client actually waits before a response is returned from the server - the more WebSurge can benefit from multiple threads, as the threads essentially go idle while waiting for inbound requests.

For either of these scenarios I would suggest that when testing start with the same number of threads as CPUs and then re-run tests by bumping up the number by doubling. You should immediately be able to tell if there's a benefit. For very high volume requests (again 10's of thousands) you're like to actually see request counts drop. For slower requests that take a few milliseconds of processing on the server you should see improved performance and you likely will have to try with differing thread values to find the optimal configuration.

Request Delays

WebSurge uses dedicated threads to run requests. The threads are pre-allocated and then run continuous requests on that thread while request processing is active. In between requests there are three different process yielding modes applied in between each request which can be set via the DelayTimeMs Session option or the -d switch from the command line tool:

No delay or yielding at all - request run one after the other (-1)
Yield after each request - allows thread context switch (0)
Yield and delay after each request - inserts a short delay (>0)

The first mechanism should be used with caution as it can ramp up CPU usage drastically. Yielding basically causes a CPU thread context switch to allow another thread to have a go at the CPU resources used. This is typically enough to keep CPU usage reasonable.

Use the Command Line Tooling

To get about a 10-20% request count improvement for high volume load tests, run your tests from the command line. The command line removes all the UI overhead used by the full application and therefore runs a bit more efficient. There's also a bit less memory and CPU overhead for running the command line tool.

Running from the command line is as easy as this using a URL:

 websurgecli -t8 -s60 -d0 http://yoursite.com/test/

or, using a previously created and saved Session file:

 websurgecli -t8 -s60 -d0 c:\temp\TestSession.websurge