Load Test Spike - HearstCorp/rover-wiki GitHub Wiki

MEDIAOS-1403

Motivation & Background

According to various statistics, web application performance is affecting the bottom line of online businesses. 40% will abandon a web page if it takes more than three seconds to load (Akami, 2009), 88% of online consumers are less likely to return to a site after a bad experience(Gomez, 2010). The average web page has almost doubled in size since 2010. At same time mobile web browsing accounted for near 50% in 2015 ([Clickz, 2015] (https://www.clickz.com/clickz/column/2388915/why-mobile-web-still-matters-in-2015)). It is clear to see that page load time is a key performance indicator and load/stress testing is business critical. By using in-line automated load testing approach, businesses and teams can avoid performance related issues.

Requests on Prod

The following data is from Kibana. Jan 1 - 10, 2016, Jan 5

time range Jan 1, 2016 - Jan 10, 2016 Dec 28 - 31, 2015 Jan 5, 2016 12pm-5pm
peak hit 3,543,693 / hour 3,424,909 / hour 62,759 / min
peak miss 1,270,021 / hour 892,679 / hour 27,561 / min
peak hit cos 1,093,841 / hour 795,471 / hour 16,375 / min
peak miss cos 206,445 / hour 187,659 / hour 3,960 / min

Seems the finer the unit, the higher the requests. That makes sense, cause the requests are not evenly distributed over time. Unfortunately, we don't have much holiday data (only 14 days data stored). Assume that the real traffic doubled before/around Christmas time, so the requests per min would peak at:

Req peak for cosmo for all
no cache 8000/m = 133/s 55000/m = 917/s
all 32000/m = 678/s 270960/m = 3010/s

Infrastructure Load Testing

Main goal is to test the whole stack of Rover functions well.

plan 1: RDS testing

create a few boxes to hit the DB directly, with pure read, pure write and read/write.

this will serve as a baseline for the future Rover tests.

Hardware: 1 master + 3 readonly. each box is db.m4.xlarge (4cpu, 13ECU, 16GB). after migration, max_connections=528.

config val notes
shared_buffers {DBInstanceClassMemory/32768} 25%
effective_cache_size {DBInstanceClassMemory/16384} 50%
checkpoint_segments 16
checkpoint_completion_target 0.9
default_statistics_target 100
work_mem 65536 or {DBInstanceClassMemory/262144} ?
maintainance_work_mem {DBInstanceClassMemory/16384} 10%
logging_collector on
log_statements all
log_directory

pgbench test 1: readonly, select * from content order by created_at limit 25;

results: not good. high latency, low throughput, around 300 tps. Turns out that there is no index on created_at yet.

pgbench test 2: readonly, simple select from

for 64 connections on 8 threads, trying select from different tables in 60s:

test query #transactions processed total latency avg (ms) tps (including conn establishing) tps (exclude conn estab) bytes_per_row
select * from content limit 25; (51119 total) 59639 64.387 990.1 990.52 5228
select * from categories limit 25; (85 total) 315154 12.185 5251.4 5252.67 82
select * from images limit 25; (101810 total) 160126 23.981 2667.89 2668.55 1444

answer the table performance depends on row size of each table.

jmeter remote tests

jmeter script: readonly-read.jmx

postgres hardware: db.m4.xlarge (4 cores, 16GB memory)

jmeter nodes hardware: m4.large (2 cores, 8GB memory)

jmeter nodes rps
2 1294.2
2 1290.6
3 1383.7

plan 2: Rover testing without caching

Goal is to test the Rover infrastructure, and push to the limit. No fastly or other caching available.

read testing

different requests (article, content, galleries, ...), starting from 50 concurrent users.

then increase the #RPS

Requirements:

  • data
  • requests distribution (NewRelic, Kibana?)

write testing

different requests to insert or update (content, article, ...), starting from 50 concurrent users. then increase

  • test 1: write into the same table
  • test 2: mixed writes into different tables

Requirements:

  • write distribution (?)

read/write testing

use different read/write ratio on one endpoint

Requirements:

  • data

App Health Load Testing (no caching)

Integrated into CI pipeline, more on app health. If any bad data which fails to meet the baseline, won't be allowed to merge.

Requirements: set the baseline, like #requests, #errorrate etc (from Grafana)

CDN Load Testing

For the previous two tests, any load test tools could be used, like Jmeter, Locust, Siege and etc. For CDN load testing, we want to simulate the loads geographically distributed. Possible tools are BlazeMeter, Blitz and etc.