Scripting Layer - giffordlabcvr/Hepadnaviridae-GLUE GitHub Wiki

The GLUE scripting layer allows you to add custom logic to your GLUE projects in the form of JavaScript programs. These programs run within the GLUE engine. They may run GLUE commands in the context of other computations. They can also be encapsulated within modules, allowing your custom logic to be invoked from other parts of the project. The main uses of the scripting layer are:

  • To execute a step in your project build. Using the scripting layer here means the build can use more complex, dynamic logic.
  • To perform some analysis of project data. The scripting layer may access any part of the GLUE project, using any GLUE command, so it can be used to execute code associated with a research question.

The key aspects of the scripting layer are covered below

JavaScript background

JavaScript is a high-level, dynamic, general-purpose interpreted programming language. It is widely used, within web browsers, for server-side development and elsewhere. Consequently, there are plenty of books and online resources for learning JavaScript, and these provide the best way to learn JavaScript. Some options are:

The version of JavaScript which is available for use within the GLUE scripting layer is a standardised version called ECMAScript 5.1. The implementation which is used is the Nashorn engine, although GLUE project developers should not need to understand the workings at this level.

Contrary to what you might expect, the GLUE engine is not written in JavaScript; it is actually written in Java, which is a different language altogether. JavaScript programs within GLUE projects interact with the GLUE engine purely via the GLUE command layer.

Your first JavaScript program: logging to the console

We will create some JavaScript programs and run them within GLUE. Please ensure you have the GLUE example project in place, as some of the examples rely on this. For your first JavaScript program, use a text editor to create a file within the exampleProject directory called helloWorld.js. Paste the following code into the file, and save it:

glue.logInfo("Hello world!");

Start GLUE within the same directory, and use the run script command to run your program:

GLUE> run script helloWorld.js
12:42:28.832 NashornJsScript INFO: Hello world!
OK

The glue object, which is always available, contains a special set of utility functions provided by GLUE to JavaScript programs operating within the scripting layer. The glue.logInfo function, provided by the glue object, will output a log message (at log level INFO) to the console.

The glue.logInfo function may optionally take a second argument. If this second argument is a JavaScript object, it will be logged to the console in JSON format. Since your program will often be operating on objects, this is very useful for debugging. To log a JavaScript object, you can modify your program like so:

var object = { hello: 1 };
object.world = 2;

glue.logInfo("Hello world!", object);

GLUE> run script helloWorld.js
12:55:09.427 NashornJsScript INFO: Hello world!
{
  "hello": 1,
  "world": 2
}
OK

Invoking GLUE commands from JavaScript

The glue object also provides functions for invoking GLUE commands. In your text editor, create a file within the exampleProject directory called listProjects.js, containing this code:

var listProjectsResult = glue.command(["list", "project"]);
glue.logInfo("listProjectsResult", listProjectsResult);

Use run script to run this program from root command mode (path "/") in GLUE:

Mode path: /
GLUE> run script listProjects.js
16:59:08.905 NashornJsScript INFO: listProjectsResult
{
  "listResult": {
    "column": [
      "name",
      "description"
    ],
    "row": [
      {
        "value": [
          "example",
          "An example GLUE project based on hepatitis E virus"
        ]
      }
    ],
    "objectType": "Project"
  }
}
OK

The list project command was run from within the JavaScript program. Note that glue.command was passed an array with one element per word or argument. So for example the command:

GLUE> list sequence --whereClause "length >= 500"

would be run from JavaScript using:

glue.command(["list", "sequence", "--whereClause", "length >= 500"]);

In listProjects.js, the result of invoking the command was a JavaScript object, this was stored in the variable listProjectsResult. This object was then logged to the console, in JSON format.

Any GLUE command which is invoked from JavaScript will always return a JavaScript object (even commands which just produce "OK" on the console). For commands such as list project, which produce tabular output, the structure of the object follows a certain convention, with the result type (in this case listResult) appearing at the outer layer of the object, and then column headers, row values and a row object type appearing at the next layer in.

The glue object contains some utilities for transforming these tabular result objects into more convenient forms. We can modify the first line of listProjects.js as follows:

var listProjectsResult = glue.getTableColumn(glue.command(["list", "project"]), "name");

By applying the glue.getTableColumn function to the tabular result object, we can extract the "name" column as an array, so that the logged result is simply:

Mode path: /
GLUE> run script listProjects.js
16:59:08.905 NashornJsScript INFO: listProjectsResult
[
  "example"
]
OK

Alternatively, we could apply the glue.tableToObjects function to the result object:

var listProjectsResult = glue.tableToObjects(glue.command(["list", "project"]));

This converts it to an array of objects, one per row, with fields named according to the column headers:

Mode path: /
GLUE> run script listProjects.js
17:28:08.758 NashornJsScript INFO: listProjectsResult
[
  {
    "name": "example",
    "description": "An example GLUE project based on hepatitis E virus"
  }
]
OK

When you are using the command line interpreter, the commands you can invoke depend on the current command mode. The same applies when commands are invoked from JavaScript programs. When you use run script to run JavaScript, the program inherits the command mode that was in place when run script was executed. The list project is only available in root command mode, therefore listProjects.js will only work if invoked from root mode. If you try to run listProjects.js from within project mode for example, you will get an error.

You can however, change mode within a JavaScript program. This is done by using the glue.inMode utility function. Paste the following code into a new file listFeatures.js, in the exampleProject directory:

var features;
glue.inMode("/project/example", function() {
    features = glue.tableToObjects(glue.command(["list", "feature"]));
});
glue.logInfo("features", features);

Then run listFeatures.js from root mode:

Mode path: /
GLUE> run script listFeatures.js
17:45:50.389 NashornJsScript INFO: features
[
  {
    "name": "ORF3",
    "parent.name": null,
    "description": "ORF 3"
  },
  {
    "name": "ORF2",
    "parent.name": null,
    "description": "ORF 2"
  },
  {
    "name": "ORF1",
    "parent.name": null,
    "description": "ORF 1"
  },
  {
    "name": "Y",
    "parent.name": "ORF1",
    "description": "Y domain"
  },
  {
    "name": "X",
    "parent.name": "ORF1",
    "description": "Macro domain"
  },
  {
    "name": "RdRp",
    "parent.name": "ORF1",
    "description": "RNA-dependent RNA polymerase"
  },
  {
    "name": "PPR",
    "parent.name": "ORF1",
    "description": "Polyproline hypervariable region"
  },
  {
    "name": "PCP",
    "parent.name": "ORF1",
    "description": "Papain-like cysteine protease"
  },
  {
    "name": "MT",
    "parent.name": "ORF1",
    "description": "Methyltransferase"
  },
  {
    "name": "Hel",
    "parent.name": "ORF1",
    "description": "Helicase"
  }
]
OK

Note that the mode path string "/project/example" supplied to glue.inMode is the same as the path displayed in the interactive interpreter when in project mode. The glue.inMode function temporarily changed the mode path from root ("/") to project mode ("/project/example"). Within project mode, the list feature command was, and the result transformed to an array of objects and logged to the console.

Note that a callback function must be supplied to glue.inMode as its second argument. In the example we used an anonymous function. The supplied function is invoked within the specified mode. Calls to glue.inMode may be nested by running glue.inMode again within the callback function: the supplied mode path string is relative to the current mode; it is effectively appended to the current mode path string to produce the new mode path.

// assume we are in project mode
// switch to reference mode
glue.inMode("/reference/REF_MASTER_M73218", function() {
    // within this switch to feature-location mode
    glue.inMode("/feature-location/ORF1", function() {
        // execute something in feature-location mode.
    });
    // back in reference mode
});
// back in project mode

Mode path strings can also be constructed dynamically and composed together so that multiple nested mode changes are executed at once, for example:

// assume we are in project mode, and variables refSeqName and featureName are defined
// switch to reference mode, then within that feature-location mode
glue.inMode("/reference/"+refSeqName+"/feature-location/"+featureName, function() {
    // execute something in feature-location mode.
});
// at this point we are back in project mode

Note that the mode-wrapping feature that is available in the command line interpreter cannot be used within the scripting layer; glue.inMode must be used to change modes.

Using underscore.js

Underscore.js is a general-purpose library containing a set of functional programming utilities which are absent from ECMAScript 5.1. It is not strictly necessary to use underscore.js in the GLUE scripting layer but we have found to be useful and so we have built underscore.js into GLUE so that it is always available. For example here we use the _.each utility from underscore.js to iterate over a list using a function:

var features;
glue.inMode("/project/example", function() {
    features = glue.tableToObjects(glue.command(["list", "feature"]));
});
_.each(features, function(feature) {
    glue.logInfo("feature.name", feature.name);
});

Encapsulating JavaScript programs as modules

As we have seen, the run script command can be used to run JavaScript programs directly from a file. The ecmaFunctionInvoker module type provides an additional mechanism for running JavaScript programs, encapsulated as GLUE modules. This has several benefits:

  • The JavaScript program is stored in the database, so the project does not rely on the file system in order to run it.
  • Functions within the program can accept parameters as input.
  • Functions can return tabular, or other structured results just like built-in GLUE commands

JavaScript programs encapsulated in this way effectively provide a set of custom GLUE commands to be used alongside the built-in commands. Using ecmaFunctionInvoker modules can therefore be considered a means for GLUE project developers to extend GLUE functionality on a project-specific basis.

The concept is illustrated by the exampleEcmaFunctionInvoker module within the example GLUE project. This adds a new function hostWithGenomeRegion which takes a (coding) genome feature and start and end codon positions, and outputs a table. The table contains a row for each sequence in the example set, listing its sequence ID, host species and the amino acid translation and underlying nucleotides for the specified region.

The exampleEcmaFunctionInvoker module essentially consists of two files. The main logic is specified in a JavaScript file, exampleEcmaFunctionInvoker.js. Note that JavaScript code executed within an ecmaFunctionInvoker module always starts its execution in project command mode.


// Given a (coding) genome feature and start and end codon positions
// For each example sequence, list its sequence ID, host species
// and the amino acid translation and underlying nucleotides for that region
function hostWithGenomeRegion(feature, startCodon, endCodon) {
    // where clause used at various points to select the example sequence AlignmentMembers
    var whereClause = "sequence.source.name = 'ncbi-hev-examples'";

    // object used as an associative map from sequenceID to result row
    var resultRowMap = {};

    glue.inMode("alignment/AL_MASTER", function() {
        // list the alignment members, retrieving sequence ID and host species
        var listMemberResults = glue.tableToObjects(glue.command(["list", "member", "--recursive", "--whereClause", whereClause,
                      "sequence.sequenceID", "sequence.host_species"]));
        // for each alignment member, create a result row object, with sequenceID and hostSpecies fields
        // and add it to the result row map
        _.each(listMemberResults, function(listMemberResult) {
            var sequenceID = listMemberResult["sequence.sequenceID"];
            var memberObj = {
                sequenceID: sequenceID,
                hostSpecies: listMemberResult["sequence.host_species"],
            }
            resultRowMap[sequenceID] = memberObj;
        });
    });

    // run the protein alignment exporter to generate the specified genome region for each example sequence
    var aaRegionColumnHeader = "aminoAcids_"+feature+"_"+startCodon+"_to_"+endCodon;
    glue.inMode("module/exampleEcmaProteinAlignmentExporter", function() {
        var aaGenomeRegions = glue.command(["export", "AL_MASTER", "--relRefName", "REF_MASTER_M73218",
                                            "--featureName", feature, "--labelledCodon", startCodon, endCodon,
                                            "--recursive", "--whereClause", whereClause, "--preview"]);
        // add each region to the appropriate result row
        _.each(aaGenomeRegions.aminoAcidFasta.sequences, function(region) {
            resultRowMap[region.id][aaRegionColumnHeader] = region.sequence;
        });
    });

    // run the nucleotide alignment exporter to generate the specified genome region for each example sequence
    var ntRegionColumnHeader = "nucleotides_"+feature+"_"+startCodon+"_to_"+endCodon;
    glue.inMode("module/exampleEcmaNtAlignmentExporter", function() {
        var ntGenomeRegions = glue.command(["export", "AL_MASTER", "--relRefName", "REF_MASTER_M73218",
                                            "--featureName", feature, "--labelledCodon", startCodon, endCodon,
                                            "--recursive", "--whereClause", whereClause, "--preview"]);
        // add each region to the appropriate result row
        _.each(ntGenomeRegions.nucleotideFasta.sequences, function(region) {
            resultRowMap[region.id][ntRegionColumnHeader] = region.sequence;
        });
    });

    // return result row objects as a list.
    return _.values(resultRowMap);
}

This is then encapsulated in a module, as specified by the exampleEcmaFunctionInvoker.xml file:

<ecmaFunctionInvoker>
    <!-- One or more JavaScript files are referenced here. These will be loaded into the
         database along with the module config by the loadResources option -->
    <scriptFileName>exampleEcmaFunctionInvoker.js</scriptFileName>
    <!-- each <function> element specifies a JavaScript function which will be
         made available via the invoke-function module command. The function name
         must match a function defined in one of the JavaScript files -->
    <function>
        <name>hostWithGenomeRegion</name>
        <!-- The parameter names must be specified -->
        <parameter>
            <name>feature</name>
        </parameter>
        <parameter>
            <name>startCodon</name>
        </parameter>
        <parameter>
            <name>endCodon</name>
        </parameter>
        <!-- This resultType element indicates that the function will return an array of objects
             and that GLUE should construct a table result from these. -->
        <tableFromObjectsResultType/>
    </function>
</ecmaFunctionInvoker>

We can invoke the hostWithGenomeRegion function from the commmand line, it produces an interactive table like built-in GLUE commands:

Mode path: /project/example
GLUE> module exampleEcmaFunctionInvoker invoke-function hostWithGenomeRegion ORF2 70 75
+============+=======================+==========================+===========================+
| sequenceID |      hostSpecies      | aminoAcids_ORF2_70_to_75 | nucleotides_ORF2_70_to_75 |
+============+=======================+==========================+===========================+
| AB481226   | -                     | PGAGAR                   | CCCGGGGCTGGAGCTCGC        |
| AB591734   | Herpestes javanicus   | SGAGAR                   | TCCGGGGCTGGAGCTCGC        |
| AF444003   | -                     | AGAGPR                   | GCCGGGGCTGGACCTCGT        |
| FJ705359   | Sus scrofa            | SGAGAR                   | TCCGGGGCTGGAGCTCGC        |
| FJ763142   | Homo sapiens          | AGAGAR                   | GCCGGGGCTGGAGCTCGC        |
| FJ998015   | Sus scrofa            | SGAGAR                   | TCCGGGGCTGGAGCTCGC        |
| JF443717   | Homo sapiens          | AGAGPR                   | GCCGGGGCTGGACCTCGC        |
| JQ013791   | Oryctolagus cuniculus | SGSGAR                   | TCCGGGTCTGGAGCCCGT        |
| JX855794   | Sus scrofa            | AGAGAR                   | GCCGGGGCTGGAGCTCGC        |
| KP294371   | Sus scrofa            | SGAGAR                   | TCCGGGGCTGGAGCTCGC        |
+============+=======================+==========================+===========================+

Mode path: /project/example
GLUE>


⚠️ **GitHub.com Fallback** ⚠️