Choosing the BenchBase benchmark - dbtuneai/synthetic_workload GitHub Wiki

Background on synthetic versus real-world benchmarks

The DBtune service helps to identify bottlenecks of PostgreSQL database instances. In order for DBtune to improve throughput or query runtime, the workload that is being tuned needs to have some form of bottleneck. This bottleneck is often related to the memory or disk, which can be resolved through a more careful configuration of the database instance. As a consequence, the benchmarks used in this guide should have the following properties:

  • Mimic real-world workloads:
    • Not too simple.
    • Complex queries.
    • Not too many writes that make the DB grow in size very quickly β€” This is unrealistic in real-world workloads.
  • Manifest a bottleneck:
    • The database size is significantly larger than available RAM memory.
    • Have enough threads β€” Terminals in TPC-C terminology.
  • Have a noisy but roughly constant performance trend:
    • The workload stays the same or with minor changes during the tuning. This is the case in real-world workloads which are for example day-time workloads or night-time workloads, where the workload is expected to stay roughly unchanged during those time windows, e.g., day, night. The ResourceStresser, Epinions, and CH-benCHmark benchmarks all follow these guidelines. However, we recommend using the resourcestresser benchmark because it is faster to load.

Edit the BenchBase config file

Each benchmark in BenchBase comes with a benchmark configuration file, which allows the user to change the database and its workload. Advanced users of this repo, can edit the BenchBase config file for the benchmark they want to use:
nano benchbase-postgres/config/postgres/sample_{benchmark}_config.xml

This repo comes pre-configured with the config files for the ResourceStresser, Epinions, and CH-benCHmark benchmarks. However, if you are using a different cloud instance than the one recommended, i.e., m5d.xlarge for AWS EC2 and Standard_D4s_v3 for Azure VM, you must configure the parameter scalefactor accordingly so that the database is significantly larger (e.g., at least 2 times larger) than the available RAM memory β€” This is important to produce a more realistic database. By using the config files provided with this repo, you will generate a 40 or 80 GB database (80 GB for epinions), which will work well with m5d.xlarge. The total database size scales linearly with scalefactor.

On the BenchBase benchmarks

We recommend using one of the three pre-configured settings included in this repo for the ResourceStresser, Epinions, and CH-benCHmark benchmarks. When running on an EC2 m5d.xlarge instance no change is required. If you are running on Azure Standard_D4s_v3 with the P15 tier attached disk no benchmark configuration changes are needed for ResourceStresser but we didn't explore the other benchmarks enough to make a firm recommendation.

  1. ResourceStresser: scalefactor 80,000, terminals 50, weights [0,0,50,50,0,0]. These settings make this benchmark I/O heavy. It loads in 30 minutes. We recommend letting it run for one hour so that the TPS stabilizes, before starting the tuning session. The database size is initially 40 GB. This benchmark is used to stress either CPU, I/O or lock contention, depending on how you configure the weights.
  2. Epinions: scalefactor 10,000, terminals 50. This benchmark takes ~26 hours to load β€” The database size here is 80 GB. This benchmark models an old social network platform and consists of nine different query types.
  3. CH-benCHmark: calefactor 400, terminals 50. Note that this benchmark degrades slightly over time, meaning that the throughput is not stable. This benchmark takes ~80 minutes to load β€” The database size is 40 GB. This benchmark is a composite benchmark and consists of the TPC-C workload and the TPC-H workload running in parallel on the same tables.

All other benchmarks implemented in BenchBase have some issues that make them less suitable to use for tuning configuration knobs. The following is an analysis of the rest of the BenchBase benchmarks and their limitations:

  • Twitter β€” This benchmark will create a large number of locks which use up a lot of the RAM. DBtune will automatically predict a potential crash and change the configuration of the instance to avoid the machine crashing, so this benchmark is technically safe to use. However, it lowers the utility of tuning some of the configuration knobs since much less RAM is available to be allocated to other important PostgreSQL functions.
  • TPC-C β€” Benefits from DBtune tuning but the throughput degrades over time which means that it doesn't follow the properties of a suitable benchmark as described above.
  • Smallbank β€” Similar behavior as TPC-C.
  • TPC-H β€” Suitable to be tuned for query runtime. It takes 20 hours to load to 40 GB on an m5d.xlarge instance.
  • TATP β€” Too simple.
  • Hyadapt β€” Not consistent enough. There is no documentation available.
  • Wikipedia β€” Too simple. Some headroom is available when tuned for query runtime.
  • YCSB β€” Too simple.
  • SEATS β€” Degrades rapidly over time.
  • AuctionMark β€” Crashes during the loading phase. It has many inserts and so probably makes the database grow in size very quickly. The throughput quickly degrades over time.
  • Voter β€” Too simple, only has one query type. The database size can’t be increased.
  • SIbench β€” Too simple, only has two query types. The database size can’t be increased.
  • NoOp β€” Too simple, only has one query type.
  • OTMetrics β€” Too simple, only has one query type.
  • TPC-DS β€” No config file provided in BenchBase.

If you try one of these benchmarks, you need to configure the following benchmark parameters:

  • scalefactor: Sets the size of the database.
  • terminals: Sets the number of threads querying the database.
  • time: Set this to a number of seconds that will cover the entire tuning duration.
  • rate: Set this to unlimited to enable a bottleneck.
  • weights [optional]: Configuring the weights decides the frequency of the different query types. For example if you run Twitter, you can set scalefactor to 8500 (which creates a 40 GB database), terminals to 400, time to 86400 (which is equivalent to one day), rate to unlimited and keep the weights as is.
⚠️ **GitHub.com Fallback** ⚠️