shell scripts - acfr/comma GitHub Wiki

Comma provides a set of tools to simplify common tasks in shell programming:

Table of Contents Parsing command-line arguments Measuring run-time Measuring system load Automating test suites quick tutorial to run tests in parallel to load-balance and/or serialize tests to automate resources management Handling signals Coordinating resource usage Resource descriptions Resource management Initialize Run the resource users Coordinate resource usage

Parsing command-line arguments

To use, include a line source comma-application-util in your script. Provide a function that describes your command-line options, e.g.

 function description()
 {
     cat <<EOF
 --help,-h; show this help
 --config,-c=<file>; config file (mandatory)
 --db-name,-d=[<dbname>]; overwrites the configuration database name
 EOF
 }

Then use as:

 comma_path_value_to_var  --prefix=options < <( description | comma-options-to-name-value $@ )

The line above will create variables options_help, options_config, options_db_name. Check for the presence of a specific option as comma_options_has --help $@ && do_something. The configuration file is mandatory, the db-name is optional (enclosed in []).

Measuring run-time

Wrap the call to your function as comma_progress_named <file> <name> <command> <args>. Here file is the file where timestamps are written before and after executing user-provided command with args. The argument name is an arbitrary name of user's choice that will be put in the file to distinguish this entry from other time measurements stored in the same file. Finally, at the end of the run, invoke cat 'file' | comma-progress --elapsed to obtain the summary. Statistics is stored in CSV format. Use comma-stats to convert path-value summary statistics accumulated by comma_progress_named calls to CSV format and optionally produce its plots in PDF.

Measuring system load

Run-times can be deceptive if a system concurrently runs other tasks. To monitor the system load as a function of time, comma-top utility provides a convenience wrapper invoking the top (1) utility in batch mode, parsing its output, and presenting the results in CSV format.

 $ comma-top --num-samples=3
 20140917T125842,8.4,2.7,11.1,16190276,8187732,362836
 20140917T125845,12.0,3.5,15.5,16190368,8187732,362836
 20140917T125848,9.7,2.7,12.4,16193196,8187732,362836

The output is produces every few-seconds (internal default of top is used; can be controlled through --sampling-interval option of comma-top). The values in each line are

 $ comma-top --output-fields
 timestamp,cpu/user,cpu/system,cpu/total,memory/ram,memory/shared,memory/swap

Memory usage is reported in kB, CPU load in percentages of the total (over all available cores) load. Shared memory usage is the only quantity not taken from top (1) output. Instead, it is extracted from df output for the shared memory partition.

Automating test suites

Assume there is a set of scripts or programs that implement some tests. To run all these tests in one command (not at once), use comma-test-run. This script finds subdirectories containing the files called test, input or expected. Execute test, then either:

if expected exists, compare its contents to the output of test (in "path=value" format)
otherwise the success of the test depends on the exit status of the test script or program.

File input, if exists, is fed to the stdin of the test script/program.

If there is an input or expected file but no test in the same directory, the test script in the closest parent directory is used. If there is a test by itself with no input or expected file, it is only executed if there is no subdirectory containing input, expected or test.

The comma-test-run returns 0 if all tests succeed, or non-zero if any of the tests fail.

Options:

--help,-h; show help (mostly corresponding to this text)
--debug; much more debug output
--quiet,-q; minimize output
--until-first-failure,-f; exit after the first failure (works only when tests run serially)
--path=[<dir>]; data-storage directory for tests, default: none, let the tests define it
--parallel; run tests in parallel
--max-parallel=[<N>]; run up to N tests in parallel; default: 8
--max-memory-shared=[<N>]; limit on shared memory, in MB; default: 8192
--max-wait=[<time>]; maximal time to wait for available CPUs before failing a test, default: forever

To disable tests in a directory and all its subdirectories, create an empty file named disabled in that directory.

quick tutorial

Seeing once is supposedly better than reading a hundred times. You may start with this tutorial.

to run tests in parallel

Parallel/serial execution is subject to the following rules:

If none of --parallel and --max-parallel options is given, tests are run sequentially. This is the default.
If --max-parallel is given, e.g., as in --max-parallel=4, up to 4 tests are run at once; but see below for load-balancing.
If --parallel is given without --max-parallel, the script attempts to use as many CPUs/cores as are available on the system. This is currently not recommended for resource-heavy tests as explained below.

to load-balance and/or serialize tests

To describe a resourse-heavy test, create a file named config in the test directory and specify the number of CPUs and the amount of shared memory to be used by the test; may also specify 'serial="true"' to serialize the test.

Examples of config files:

 resources/cpus=4
 resources/memory/shared=3200

Will not run until at least 4 or all CPUs (out of --max-parallel number) are available. Will not run until 3200 MB of shared memory (counted out of --max-memory-shared total) is available.

 resources/serial="true"

Will run only when no other test is running.

You may explicitly specify limits for arbitrary resources, see the tutorial for examples.

to automate resources management

Comma provides a script for semi-automatic generation of resources configuration files. First, run a test stand-alone instructing comma-test-run to accumulate system load data:

 comma-test-run --estimate-resources

The --estimate-resources option instructs comma-test-run to start comma-top on the background before running the actual tests. The accumulated measurements of CPU and memory utilization would be stored in output/performance.csv file (note that if the test is too short, this file may be empty; likely, such short tests do not need any resources management after all). Obviously, this run of the test shall be done on an otherwise unloaded system handling a minimal number of other tasks.

Once the performance measurements have been accumulated, run the auxiliary script comma-test-resources. It shall be invoked in the same test directory (or can read the name of the test directory or multiple directories on standard input) and has two usage modes:

recommend: calculates the difference between minimal and maximal memory usages (and minimal and mean CPU usage) during the time of the test; these differences are attributed to the extra load cause by the test and printed to standard output in the format of a resources configuration (user can re-direct this output straight into a file)
analyse: compares the output of 'recommend' step above with an existing configuration file, if any; gives opinion on tuning the configuration file; if no file exists, just recommends to create it if resources usage is significant.

By default, comma-test-resources ignores resources usage below certain thresholds. See --help for the list of options that change these settings.

Handling signals

Comma provides a shell function comma_execute_and_wait for automating the common usage pattern:

 run_something long long long &
 wait $!
 # if a signal arrives kill run_something and whatever has been invoked from it

The problem is deceptively simple. The complications occur because:

signals are handled after completion of the foreground process; to handle a signal immediately you have to wait (as above)
a signal sent to run_something above would be sent to the process itself; the process will likely be running something long; if that "something" is a foreground process, the signal will not be handled until it is over; thus, the grandchild process has to be run on the background with run_something waiting for it, which reproduces the original problem recursively.

The comma_execute_and_wait is a convenience wrapper for resolving such issues. The function invokes an arbitrary other shell function, script, or routine (given as an argument) on the background and waits for it to complete, but also sets up the signal handling so that the invoked function / routine is killed immediately when/if a signal arrives:

  source comma-application-util
  comma_execute_and_wait "find /hay/drive -name needle"
  comma_execute_and_wait "bash -c my_long_function arg1 arg2"

The immediate termination is achieved by running the given command in a separate process group and setting up a custom trap to terminate that group. (As a special usage mode "comma_execute_and_wait --process" may run the given command as a process, not a process group; but then the command shall handle signals intelligently by itself.) If given multiple commands, comma_execute_and_wait will start them all on then background and wait until all are done. The default overall exit status is success if all commands succeeded. This can be changed by "--any" option: the status would be a success if at least one command succeeded:

  comma_execute_and_wait --any "find /hay/drive1 -name needle" "find /hay/drive2 -name needle"
  success=$?   # is 0 if at least one find was successful

Finally, on exit, comma_execute_and_wait restores the earlier-set traps. If it is terminated by a signal, it restores traps and then sends the signal to itself possibly triggering one of the newly-restored traps. Please review carefully if those restored traps are relevant in the process that runs comma_execute_and_wait and unset (trap - SIGTERM SIGINT ...) if necessary. This was a common mistake when comma_execute_and_wait itself was used in a background process while the traps were set up for use in the main process.

Note that any shell function to be invoked under comma_execute_and_wait shall be export -f-ed. Similarly, all environment variables to be used inside that function / script shall be export-ed. See the comments in comma-application-util for description of all comma_execute_and_wait capabilities and features.

Coordinating resource usage

Assume that you have to run several processes that either heavily use limited system resources (e.g., shared memory) or use an exclusive resource (a device attached to the system). Generally speaking, you may want to run multiple potential users of these resources, but need to coordinate resource access.

The script comma-resources-util provides a set of convenience shell functions to allow external coordination of resource usage across multiple user processes. The main functions are:

comma_initialize_resources
comma_acquire_resources
comma_release_resources
comma_total_system_resources

These functions operate on resource descriptions stored in JSON-formatted files. The scripts do not allocate any resources as such, that is done directly by the resource users. Instead, the script help the users to coordinate resource allocation along the lines of:

user 1 pen-marks some resource as being in use
user 2 wants to use the same resource, but sees it as occupied; user 2 waits
user 1 releases the resource
user 2 may now proceed

This coordination is cooperative. It is assumed that in the above example user 2 explicitly volunteers to wait until the resource is available. There is no enforcement of this rule.

Resource descriptions

Resource coordination involves the use of four files:

the resource counter file that looks like:

 {
     "resources":
     {
         "cpus": "0",
         "memory":
         {
             "main": "0",
             "shared": "0"
         },
         "serial": "0"
     },
     "processes":
     {
         "queue": "[]",
         "users": "[]"
     }
 }

The exact sections under the resources path may differ in your case. E.g., you may have entries like

 "robots":
 {
     "queen": "1",
     "working_bee": "100"
 }

if you have two types of robots, one unique (queen) and the other in multiple replicas (working bees). The specific resources names do not matter as long as they are used consistently in all the input files and are expressed as either integer or boolean values. Booleans are essentially used as integers with the range of 0, 1. Floating point values are not advised. They may work, but sooner or later may fail due to round off errors that may cause counters never returning to zeros and similar issues. The entries under the processes path are the lists of current resource users (empty on initialization) and the queue of users waiting for the resources (also empty on start). The counter file is generated once before any resource use and shall be initialized to zeros; comma_initialize_resources function produces such a file for the most commonly use resource types (shown above). The resource counter is read and written by all the resource-management utilities described here.

the resource limits file that looks like:

 {
     "resources":
     {
         "cpus": "8",
         "memory":
         {
             "main": "16768479232",
             "shared": "8384237568"
         },
         "serial": "true"
     }
 }

This file describes the maximal resources that can be allocated by all users together. The serial entry is an example of a unique resource, only one user may occupy it at a time. This file is read-only, it is never changed by the resource-management functions.

the resource request file that looks like:

 {
     "resources":
     {
         "cpus": "2",
         "memory":
         {
             "main": "1600000",
             "shared": "800000"
         }
     }
 }

This file describes the intended resource usage by the user. An instance is created for each new request for resource acquisition. This file can be overwritten by the resource-management utilities as described below. It is critical to use the same file (name) when acquiring and releasing the resources so that the counters are incremented and decremented by the same values.

the resource lock file; as the resources can be requested and releases from multiple user processes running in parallel independently on each other, it is important to provide synchronization between parallel accesses to the resource counter file. The script comma_sync provides a convenience function comma_locked that allows access serialization through flock (1) mechanism. The usage is like:

 comma_locked resource-lock-file comma_acquire_resources <arguments>

Here resource-lock-file is an arbitrary file provided by the user. The file will be overwritten on access (it stays empty, the file contents is never used). The use of comma_locked is not mandatory and the user may provide her own implementation for serializing all accesses to the resource counter file.

Resource management

Coordinated resource usage shall implement the following pattern:

Initialize

initialize the resource limits file; E.g., one may use the convenience function comma_total_system_resources that measures and prints the total resources available on the system:

 comma_total_system_resources | name-value-convert --from path-value --to json > limits.json

The user may as well manually create a JSON file with the resource descriptors relevant for her problem, such as the counter of various robots, network interfaces, disk space, etc. This tutorial assumes the name limits.json for this file.

initialize the resource counter file; To create a counter file that matches the limits file create at the previous step, one may

 cat limits.json | name-value-convert --from json --to path-value | comma_initialize_resources counter.json

The newly-created counter.json will have the same resources entries as the limits.json file. This tutorial assumes that the counter is named counter.json.

create a lock file; Assuming you are going to use comma_locked synchronization, just

 touch lock

Below we assume that the lock file is simply called lock.

Run the resource users

Each user is an independent process that has some unique id, which can be a process ID, or just a counter, such as worker number; ids are assumed to be integers; below the variable $id is used as the unique user id. All the users must use the same resource limits and counter files, and the same lock file (if comma_locked is used).

Coordinate resource usage

Whenever a user needs to acquire some of the managed resources, it shall generate a resource request file (called request.json below) in the format described above. Then the user shall wrap its resource allocation in the following sequence of calls, typically done in a loop:

 while true ; do
   outcome=$( comma_locked lock comma_acquire_resources request.json counter.json limits.json $id )

This command checks if sufficient resources are still available to satisfy the request (roughly, this is equivalent to the mathematical statement counter + request <= limits) and if anyone else (or ourselves) is already waiting for resources:

If the resource request alone exceeds the resource limits, the request.json file is overwritten with the maximal resource settings taken from the limits.json file.
If resources are available and either no other user is waiting for the resources or this user id is the first in the queue waiting for resources, the function above increments the counter file and returns with the exit status of 0 (success). The user id is appended to the users list stored in the resource counter. If the user was waiting (its id was in the queue), the user id is removed from the queue.

   if [[ $? == 0 ]] ; then
     # run your activity
     ...

Once done, marked resources as available again:

     comma_locked lock comma_release_resources request.json counter.json $id
     break # all done, may go to the next task out of the resource-waiting loop
   fi

If anyone is already waiting for resources, the acquire_resources call above returns 1 (failure). The user id is appended to the waiting queue. The command outputs the word wait to standard output (therefore, the outcome variable above would have the value wait.
It is assumed that the user is well behaved and would wait:

   if [[ "$outcome" == "wait" ]] ; then
     sleep some_time
     continue # will try to acquire resources again
   fi
 done # and of the resource-waiting loop

Note that the request is considered to exceed the limits whenever one of the limits is exceeded, e.g., we ask for more CPUs than are available in total, even if we ask for little (or no) memory. In this case, the scripts optimistically allow resource acquisition hoping that users know what they are doing.