Load Test Spike - HearstCorp/rover-wiki GitHub Wiki
Motivation & Background
According to various statistics, web application performance is affecting the bottom line of online businesses. 40% will abandon a web page if it takes more than three seconds to load (Akami, 2009), 88% of online consumers are less likely to return to a site after a bad experience(Gomez, 2010). The average web page has almost doubled in size since 2010. At same time mobile web browsing accounted for near 50% in 2015 ([Clickz, 2015] (https://www.clickz.com/clickz/column/2388915/why-mobile-web-still-matters-in-2015)). It is clear to see that page load time is a key performance indicator and load/stress testing is business critical. By using in-line automated load testing approach, businesses and teams can avoid performance related issues.
Requests on Prod
The following data is from Kibana. Jan 1 - 10, 2016, Jan 5
time range | Jan 1, 2016 - Jan 10, 2016 | Dec 28 - 31, 2015 | Jan 5, 2016 12pm-5pm |
---|---|---|---|
peak hit | 3,543,693 / hour | 3,424,909 / hour | 62,759 / min |
peak miss | 1,270,021 / hour | 892,679 / hour | 27,561 / min |
peak hit cos | 1,093,841 / hour | 795,471 / hour | 16,375 / min |
peak miss cos | 206,445 / hour | 187,659 / hour | 3,960 / min |
Seems the finer the unit, the higher the requests. That makes sense, cause the requests are not evenly distributed over time. Unfortunately, we don't have much holiday data (only 14 days data stored). Assume that the real traffic doubled before/around Christmas time, so the requests per min would peak at:
Req peak | for cosmo | for all |
---|---|---|
no cache | 8000/m = 133/s | 55000/m = 917/s |
all | 32000/m = 678/s | 270960/m = 3010/s |
Infrastructure Load Testing
Main goal is to test the whole stack of Rover functions well.
plan 1: RDS testing
create a few boxes to hit the DB directly, with pure read, pure write and read/write.
this will serve as a baseline for the future Rover tests.
Hardware: 1 master + 3 readonly. each box is db.m4.xlarge (4cpu, 13ECU, 16GB). after migration, max_connections=528.
config | val | notes |
---|---|---|
shared_buffers | {DBInstanceClassMemory/32768} | 25% |
effective_cache_size | {DBInstanceClassMemory/16384} | 50% |
checkpoint_segments | 16 | |
checkpoint_completion_target | 0.9 | |
default_statistics_target | 100 | |
work_mem | 65536 | or {DBInstanceClassMemory/262144} ? |
maintainance_work_mem | {DBInstanceClassMemory/16384} | 10% |
logging_collector | on | |
log_statements | all | |
log_directory |
pgbench test 1: readonly, select * from content order by created_at limit 25;
results: not good. high latency, low throughput, around 300 tps. Turns out that there is no index on created_at yet.
pgbench test 2: readonly, simple select from
for 64 connections on 8 threads, trying select from different tables in 60s:
test query | #transactions processed total | latency avg (ms) | tps (including conn establishing) | tps (exclude conn estab) | bytes_per_row |
---|---|---|---|---|---|
select * from content limit 25; (51119 total) | 59639 | 64.387 | 990.1 | 990.52 | 5228 |
select * from categories limit 25; (85 total) | 315154 | 12.185 | 5251.4 | 5252.67 | 82 |
select * from images limit 25; (101810 total) | 160126 | 23.981 | 2667.89 | 2668.55 | 1444 |
answer the table performance depends on row size of each table.
jmeter remote tests
jmeter script: readonly-read.jmx
postgres hardware: db.m4.xlarge (4 cores, 16GB memory)
jmeter nodes hardware: m4.large (2 cores, 8GB memory)
jmeter nodes | rps |
---|---|
2 | 1294.2 |
2 | 1290.6 |
3 | 1383.7 |
plan 2: Rover testing without caching
Goal is to test the Rover infrastructure, and push to the limit. No fastly or other caching available.
read testing
different requests (article, content, galleries, ...), starting from 50 concurrent users.
then increase the #RPS
Requirements:
- data
- requests distribution (NewRelic, Kibana?)
write testing
different requests to insert or update (content, article, ...), starting from 50 concurrent users. then increase
- test 1: write into the same table
- test 2: mixed writes into different tables
Requirements:
- write distribution (?)
read/write testing
use different read/write ratio on one endpoint
Requirements:
- data
App Health Load Testing (no caching)
Integrated into CI pipeline, more on app health. If any bad data which fails to meet the baseline, won't be allowed to merge.
Requirements: set the baseline, like #requests, #errorrate etc (from Grafana)
CDN Load Testing
For the previous two tests, any load test tools could be used, like Jmeter, Locust, Siege and etc. For CDN load testing, we want to simulate the loads geographically distributed. Possible tools are BlazeMeter, Blitz and etc.