FAQ W - ibmcb/cbtool GitHub Wiki

Workloads

WQ1: I see that CBTOOL can run a variety of different benchmarks/workloads. Will CBTOOL install these applications (e.g., Hadoop, MySQL, DB2, WebSphere) on the VMs automatically for me?

WA1: Yes, CBTOOL can automatically install all the requirements directly in an unconfigured image during deployment. While this is technically possible, we strongly recommend you create the images using the method described described here. While this is technically possible, it comes with significant drawbacks.

  • First of all, it is important to keep in mind that deploying relying on CB to install all the workload requirements directly on top of an unconfigured image will result in potentially significant additional time for the completion of the deployment: on the more complex workloads, we are talking about tens of minutes to up to an hour.
  • Second, this requires every booted instance to have internet access, given the need to install packages from official apt, yum, pip and docker repositories.
  • Third, any failure to install a particular dependency on any image will cause the whole deployment to fail. Therefore, simple transient network failures while attempting to access an external package repository can cause a multi-VM deployment to fail.

For these reasons, we strongly advise you to build the required images beforehand, using any of the methods discussed in this howto. With this disclaimer, we proceed to explain how to use the "Virtual Application dynamic build" feature. There are two main ways of deploying all the dependencies on top of the image:

a) Directly: a new AI that is attached with a CB CLI command like aiattach build:<IMAGE IDENTIFIER FOR AN UNCONFIGURED IMAGE>:<Virtual Application Type> (e.g. aiattach build:bionic-server-cloudimg-amd64.img:iperf). In case of the API, just add change the Virtual Application Type parameter on the call (e.g., api.appattach(<CLOUDNAME>, "build:<IMAGE IDENTIFIER FOR AN UNCONFIGURED IMAGE>:<VIRTUAL APPLICATION TYPE>). During the initial VApp attachment, CB will attempt to run, for each instance, the automated installer script (i.e., ~/cloudbench/pre_install.sh; ~/cloudbench/install --role workload --wks <Virtual Application type> ... && ~/cloudbench/scripts/common/cb_post_boot.sh) on each instance.

b) "Nested" Docker images: TBD

Back


WQ2: I know that the Virtual Application Submitters can continuously deploy Virtual Applications with a certain arrival rate. What I want is to deploy a bunch of Virtual Applications at once (i.e., "bulk arrival"). Can I do that with CBTOOL?

WQ2: Yes. On the CLI, just type aiattach <vapp type> async=N (where N is the number of Virtual Applications)

Back


WQ3: Can I use CBTOOL to just deploy a Virtual Application, without generating load after it is started?

WQ3: Yes. On the CLI, just type appnoload. After this, any VApp deployed with aiattach (vapp type) will just be deployed, without generating any application load. In order to resume deploying VApps with load generation (this will not affect the already deployed Application Instances), just issue appload on the CLI.

Back


WQ4: I have a Virtual Application Submitter (VAppS) (Application Instance Deployment Request Submitter (AIDRS)) and just want to have it immediately stopping the deployment of new Virtual Application instances, while still removing the ones already created, once their lifetime expires. How can I do it?

WA4: Instead of removing the Virtual Application Submitter, which would result on Virtual Application instances created by this particular VAppS dangling perpetually (since no process is left to make sure that each individual VApp by deployed by this Virtual Application Submitter is removed when their lifetime expires), the VAppS can be put on the "stopped" state. On CBTOOL's CLI, type the command statealter VAppS_NAME stopped. To resume normal operation on the Virtual Application Submitter, use the command statealter VAppS_NAME attached.

Back


WQ5: Can I set SLA targets for Virtual Application provisioning time and performance?

WA5: Yes, CBTOOL now has rudimentary support for SLA tracking. Great care was taken to make sure that this new capability is non-intrusive to users that prefer to have their SLA tracking done from the outside. Two kinds of SLA tracking are possible, provisioning and runtime.

If a VM is deployed with the attribute "sla_provisioning_target" set, then CBTOOL will calculate, at the end of the provisioning, if the total time taken to provision is smaller than the specified value, adding a new attribute - "sla_provisioning" - with values "ok" or "violated" to the VM. In addition to it, it will also add the VM to a "view" (i.e., a list of objects that share a common attribute) that can be queried at any time.

Example for SLA provisioning:

  • CLI: aiattach open_daytrader default default none none none mysql_sla_provisioning_target=50,geronimo_sla_provisioning_target=50,client_daytrader_sla_provisioning=50 (where the parameters default default none none none are kept as defaults).
  • API: apiconn.appattach(<CLOUDNAME>, "open_daytrader", temp_attr_list = "mysql_sla_provisioning_target=50,geronimo_sla_provisioning_target=50,client_daytrader_sla_provisioning=50")

These commands will instruct CBTOOL to automatically check if the total provisioning time for each VM role on this Virtual Application type does not exceed 50 seconds, and then place each VM on a list of VMs with violated or fulfilled SLA provisioning.

These VM lists automatically compiled (also known as "views" internally by CBTOOL) can be accessed by the command viewshow.

A list of VMs who violated the provisioning SLA can be obtained with viewshow vm sla_provisioning violated, while a list of VMs who fulfilled the SLA can be with obtained with viewshow vm sla_provisioning ok. The API equivalent of these commands is api.viewshow("TESTSIMCLOUD", "VM", "SLA_PROVISIONING", "OK").

Example for SLA runtime:

  • CLI: aiattach iperf default default none none none sla_runtime_target_bandwidth=1000-gt (where the parameters default default none none none are kept as defaults).
  • API: apiconn.appattach(<CLOUDNAME>, "iperf", temp_attr_list = "sla_runtime_target_bandwidth=1000-gt")

This will instruct CBTOOL to automatically check if the reported application bandwidth is smaller than 1000 (that is controlled by "gt" or "greater than" and "lt" or "less than"), and then place the VM on a list of VMs with violated or fulfilled the SLA runtime.

A list of VMs who violated the SLA can be obtained with viewshow vm sla_runtime violated, while a list of VMs who fulfilled the SLA can be with viewshow vm sla_runtime ok. The API equivalent of these commands is api.viewshow("TESTSIMCLOUD", "VM", "SLA_RUNTIME", "OK").

IMPORTANT : Please note that method of setting SLA parameters dynamically is an alternative that avoids the need for changes directly your private configuration file. It is the recommended method for the setting of SLA, and is presented in a more complete form here.

Back


WQ6: Can I "scale-out" (i.e., increase the number of VMs) an already deployed Virtual Application instance?

WA6: Yes, CBTOOL has the CLI command (and API call) "airesize" to this end. For instance, the command airesize TESTCLOUD ai_1 hadoopslave +1 would add one additional Hadoop slave node (VM) to the already running VApp ai_1. Conversely, a Hadoop slave node (VM) could be removed with airesize TESTCLOUD ai_1 hadoopslave -1. Please note that, while the airesize command could be applied to any Virtual Application type, it is basically intended to be used with "scale-out" workloads (Vapp types), such as Hadoop and HPCC.

Back


WQ7: Can I attach data volumes to individual VM instances while deploying Virtual Applications?

WA7: Yes, provided that your Cloud (and Cloud Adapter) supports additional data volumes. Using the the Virtual Application type "cassandra_ycsb" as an example, add the following section on your private configuration file:

[AI_TEMPLATES : CASSANDRA_YCSB]
YCSB_CLOUD_VV = 10
SEED_CLOUD_VV = 10

This will instruct CB to create (and attach) a 10 GB Virtual Volume for each VM with the role the role "seed" or "ycsb" while deploying a new "cassandra_ycsb" VApp.

IMPORTANT : keep in mind that this parameter can be changed in any of the 3 ways discussed here and here (i.e., by changing the private configuration file, by issuing the CLI/API typealter <vapp type> <parameter> <value> or by dynamically overriding the parameters during an aiattach CLI/API call).

Back


WQ8: Can I deploy a VM to act as "load balancer" between two application tiers in a given Virtual Application instance?

WA8: Yes, but only for Virtual Application types where the inclusion of a "Load Balancer" instance is applicable. First, check if the parameter load_balancer_supported is True for the considered VApp type, through the use of the command typeshow.

For instance typeshow open_daytrader will show that this VApp type does support a load balancer instance. In this is the case, you have, as usual three options to change it, a discussed here and here

a) just add a section to you private configuration file:

[AI_TEMPLATES : OPEN_DAYTRADER]
LOAD_BALANCER = $True

b) CB CLI command: aiattach open_daytrader load_balancer=true

c) API call : api.appattach(<CLOUDNAME>, "open_daytrader", temp_attr_list = "load_balancer=true")

Back


WQ9: What are the multiple app columns that I see on the generated csv files (VM_runtime_app_<experiment id>.csv) or on the Web GUI (Application Performance tab on Dashboard?

WA9: Those are the Application Performance Metric samples (one of the three categories of metrics) produced during the normal execution of a Virtual Application. These metrics can be roughly divided in two categories: generic and VApp type-specific

  • Generic Application Performance metrics

    • app_load_id: (Monotonically increasing) Number of executions of a given workload (look at the bottom graph with time as the x-axis here for an illustration on how the same workload keeps being re-executed until the VApp is removed).
    • app_load_profile: For VApp types with multiple profiles - e.g., Hadoop which can execute sort, wordcount, terasort, dfsioe, nutchindexing, pagerank, bayes, kmeans, or hivebench - this field will indicate which profile is currently being executed. In case of applications with a single profile, the value of this field will be default.
    • app_load_level: Numerical value with a meaning that is specific to each Virtual Application type. For instance, for a DayTrader Vapp, it represents the number of simultaneous clients on the load generator, while for Hadoop Vapp it represents the size of dataset to be processed. While it can be set by the user as random distributions (exponential, uniform, gamma, normal), fixed numbers or monotonically increasing/decreasing sequences, this field will always have a numerical value during data collection.
    • app_load_duration: Numerical value, in seconds, that represents for how long a given execution of the VApp (i.e., a specific app_load_id) should run. This means that expected number for app_load_id is the quotient of the total life time of a given Vapp divided by app_load_duration. Very important: not every Vapp type respects this parameter. For instance, while it is perfectly possible to run the Iperf Vapp for a specific number of seconds, the same cannot be done in the Giraph Vapp, where the application runs until it complete a specific number of tasks.
    • app_datagen_size: Size of the dataset generated prior the actual execution of a specific Vapp type. For VApp types where data generation is not needed (e.g., Netperf), this field will always be zero.
    • app_datagen_time: Time (in seconds) spent generating the dataset to be used on the actual execution of a specific Vapp type. For VApp types where data generation is not needed (e.g., Netperf), this field will always be zero.
    • app_quiescent_time: Time (in seconds) between the end of app_load_id n and the beginning of app_load_id n+1. This allows the experimenter to have an estimation of the "duty cycle" for a given VApp
    • app_errors: Cumulative number of app_load_ids where the execution of the application ended in error.
  • VApp type-specific (examples)

    • These values are reported by the actual application being executed inside each individual Virtual Application Instance, and collected by CBTOOL
    • app_latency
    • app_throughput
    • app_bandwidth
    • app_loss
    • app_jitter

Back


WQ10: Can I deploy multiple "load generator" VMs into a single Virtual Application?

WA10: Yes


WQ11: Can I temporarily stop/suspend load generation on a a single Virtual Application?

WA11: Yes, just change the state of Virtual Application Instance, e.g., statealter ai_1 stopped. The Load Manager process, running in all "load generator" VMs, will stop generating load (the application restart scripts are still executed every 5 seconds). In order to resume load, just do statealter ai_1 attached


WQ12: Which "tunable" parameters are available for given Virtual Application type? How can I alter these?

WA12: First of all, just issue the command typeshow <virtual application type> (the list of available types is obtained via typelist) on the CB CLI. An illustrative example:

(CBWKS0) typeshow cassandra_ycsb
The AI with the type cassandra_ycsb has the following configuration on experiment (Cloud CBWKS0) :

description: Deploys a Cassandra cluster (N seed instances and M data instances),
plus one instance running the YCSB benchmark. This single instance
sends requests to all seed node instances simultaneously.
  - LOAD_PROFILE possible values: workloada,workloadb,workloadc,workloadd,workloade,workloadf (for a proper
    description, consult the section "Core Workloads" on the YCSB
    documentation)
  - LOAD_LEVEL meaning: number of threads on YCSB (parameter -threads).
  - LOAD_DURATION meaning: not used, a run ends when all YCSB
    operations (default is 1000) are finished.
  - COMMENT: One of the "Big Data" Workloads. One of the two
    Virtual Applications types selected for the SPECCloud 2014 v1.0
    benchmark. When new seed nodes are added (after an "airesize")
    the VM running YCSB will automatically direct requests to these
    new seed nodes.

# Attributes MANDATORY for all Virtual Applications:

sut: ycsb->2_x_seed
load_balancer_supported: False
resize_supported: True
regenerate_data: True
role_list: ycsb,seed
load_generator_role: ycsb
load_manager_role: ycsb
metric_aggregator_role: ycsb
capture_role: ycsb
load_balancer: False
load_profile: workloadd
load_level: 1
load_duration: 60
reported_metrics: throughput,latency,datagen_time,datagen_size,completion_time,errors,insert_operations,read_operations,quiescent_time

# Virtual Application-specific MANDATORY attributes:

category: transactional
profiles: workloada,workloadb,workloadc,workloadd,workloade,workloadf
reference: https://github.com/brianfrankcooper/YCSB
license: Apache v2

cassandra_setup1: cb_restart_node.sh
seed_setup1: cb_restart_seed.sh
ycsb_setup2: cb_setup_ycsb.sh
cassandra_reset1: cb_restart_node.sh
seed_reset1: cb_restart_seed.sh
cassandra_resize1: cb_restart_node.sh
seed_resize1: cb_restart_seed.sh
ycsb_start1: cb_run_ycsb.sh

# Virtual Application-specific OPTIONAL attributes:

app_collection: lazy
cassandra_data_dir: /dbstore
cassandra_data_fstyp: ext4
database_size_versus_memory: 0.5
drop_keyspace: 1
input_records: 10000
java_home: auto
java_ver: 8
jvm_stack_size: 1024k
load_generator_sources: 1
load_threads: 8
operation_count: 10000
read_ratio: workloaddefault
record_size: 2.35
replication_factor: 3
seed_data_dir: /dbstore
seed_data_fstyp: ext4
seed_ram_percentage: AUTO
update_ratio: workloaddefault
ycsb_data_fstyp: ext4
ycsb_profile: cassandra-10
yscb_data_dir: /dbstore

Here we can see Attributes MANDATORY for all Virtual Applications, Virtual Application-specific MANDATORY attributes, Virtual Application-specific OPTIONAL attributes. Each of these attributes can be changed, either statically (changes directly your private configuration file) or dynamically. Please note that the values present here are the default values for attribute.

As an illustrative example, let's deploy a single application instance of this Virtual Application type (cassandra_ycsb) with a higher thread count load_threads and bigger record size record_size). On the CB CLI, we issue the command:

aiattach cassandra_ycsb default default none none none load_threads=16,record_size=10

Once this Virtual Application instance is deployed, one can check that these are indeed the new values for the aforementioned parameters by issuing aishow ai_<number obtained from the output of the previous command>

⚠️ **GitHub.com Fallback** ⚠️