Load Test Spike - HearstCorp/rover-wiki GitHub Wiki

Motivation & Background

According to various statistics, web application performance is affecting the bottom line of online businesses. 40% will abandon a web page if it takes more than three seconds to load (Akami, 2009), 88% of online consumers are less likely to return to a site after a bad experience(Gomez, 2010). The average web page has almost doubled in size since 2010. At same time mobile web browsing accounted for near 50% in 2015 ([Clickz, 2015] (https://www.clickz.com/clickz/column/2388915/why-mobile-web-still-matters-in-2015)). It is clear to see that page load time is a key performance indicator and load/stress testing is business critical. By using in-line automated load testing approach, businesses and teams can avoid performance related issues.

Requests on Prod

The following data is from Kibana. Jan 1 - 10, 2016, Jan 5

time range	Jan 1, 2016 - Jan 10, 2016	Dec 28 - 31, 2015	Jan 5, 2016 12pm-5pm
peak hit	3,543,693 / hour	3,424,909 / hour	62,759 / min
peak miss	1,270,021 / hour	892,679 / hour	27,561 / min
peak hit cos	1,093,841 / hour	795,471 / hour	16,375 / min
peak miss cos	206,445 / hour	187,659 / hour	3,960 / min

Seems the finer the unit, the higher the requests. That makes sense, cause the requests are not evenly distributed over time. Unfortunately, we don't have much holiday data (only 14 days data stored). Assume that the real traffic doubled before/around Christmas time, so the requests per min would peak at:

Req peak	for cosmo	for all
no cache	8000/m = 133/s	55000/m = 917/s
all	32000/m = 678/s	270960/m = 3010/s

Infrastructure Load Testing

Main goal is to test the whole stack of Rover functions well.

plan 1: RDS testing

create a few boxes to hit the DB directly, with pure read, pure write and read/write.

this will serve as a baseline for the future Rover tests.

Hardware: 1 master + 3 readonly. each box is db.m4.xlarge (4cpu, 13ECU, 16GB). after migration, max_connections=528.

config	val	notes
shared_buffers	{DBInstanceClassMemory/32768}	25%
effective_cache_size	{DBInstanceClassMemory/16384}	50%
checkpoint_segments	16
checkpoint_completion_target	0.9
default_statistics_target	100
work_mem	65536	or {DBInstanceClassMemory/262144} ?
maintainance_work_mem	{DBInstanceClassMemory/16384}	10%
logging_collector	on
log_statements	all
log_directory

pgbench test 1: readonly, select * from content order by created_at limit 25;

results: not good. high latency, low throughput, around 300 tps. Turns out that there is no index on created_at yet.

pgbench test 2: readonly, simple select from

for 64 connections on 8 threads, trying select from different tables in 60s:

test query	#transactions processed total	latency avg (ms)	tps (including conn establishing)	tps (exclude conn estab)	bytes_per_row
select * from content limit 25; (51119 total)	59639	64.387	990.1	990.52	5228
select * from categories limit 25; (85 total)	315154	12.185	5251.4	5252.67	82
select * from images limit 25; (101810 total)	160126	23.981	2667.89	2668.55	1444

answer the table performance depends on row size of each table.

jmeter remote tests

jmeter script: readonly-read.jmx

postgres hardware: db.m4.xlarge (4 cores, 16GB memory)

jmeter nodes hardware: m4.large (2 cores, 8GB memory)

jmeter nodes	rps
2	1294.2
2	1290.6
3	1383.7

plan 2: Rover testing without caching

Goal is to test the Rover infrastructure, and push to the limit. No fastly or other caching available.

read testing

different requests (article, content, galleries, ...), starting from 50 concurrent users.

then increase the #RPS

Requirements:

data
requests distribution (NewRelic, Kibana?)

write testing

different requests to insert or update (content, article, ...), starting from 50 concurrent users. then increase

test 1: write into the same table
test 2: mixed writes into different tables

Requirements:

write distribution (?)

read/write testing

use different read/write ratio on one endpoint

Requirements:

data

App Health Load Testing (no caching)

Integrated into CI pipeline, more on app health. If any bad data which fails to meet the baseline, won't be allowed to merge.

Requirements: set the baseline, like #requests, #errorrate etc (from Grafana)

CDN Load Testing

For the previous two tests, any load test tools could be used, like Jmeter, Locust, Siege and etc. For CDN load testing, we want to simulate the loads geographically distributed. Possible tools are BlazeMeter, Blitz and etc.