Algorithm Execution Container - aegisbigdata/documentation GitHub Wiki

Jupyter version Usage

The Algorithm Execution Container is implemented with Python 3.6 as a Jupyter notebook. Here, we describe how the tool can be utilized from the AEGIS platform.

Deployment - Initialisation

STEP 1 - Install the notebook (optional)

Note that in the final AEGIS version the Algorithm Execution Container is by default loaded in every new project. So step #1 is optional in case a user chooses not to fire up the existing notebook.

If the Algorithm Execution Container notebook does not already exist in a specific project , one should first get the Algorithm Execution Container-Jupyter Version.ipynb file download.

Once downloaded, one should upload file to an empty Jupyter notebook.

Once this is done, a new "AEGIS" notebook is created in your project's folder.

STEP 2 - Open and initialize the notebook

Open the downloaded 'AEC notebook'. Select and run the first paragraph of the notebook. An initialisation button (labelled 'Initialise') should appear as an output of the paragraph's execution. Press the initialisation button and wait for the following paragraphs to run. You can monitor the kernel state shown in the upper right part of the notebook to know when the execution is finished. Once all paragraphs have finished running, you should see the main AEC UI, which includes four tabs.

Usage

Upon launch of the corresponding Algorithm Execution Container (hereinafter container), the user has to initialised it through a dedicated button. The initialisation ensures that the Spark interpreter of the AEGIS platform is started and also creates the basic UI of the container, which is a simple five-tab window:

  • An input file selection tab
  • An algorithm selection and configuration tab
  • An output folder selection tab
  • An overview tab showing the current user selections and, when ready, the execution results
  • An model application tab, to perform regression or classification with existing trained models on other datasets.

Apart from the overview tab, the other three correspond to the basic steps towards applying an algorithm through the container. The first step is to select the file that contains the data to be used by the algorithm, through the interface shown in the next figure, which brings up also a preview of the selected file.

The next step is to select the algorithm to be applied. The provided algorithms are grouped under five categories (algorithm families). Once an algorithm is selected, a form from which to configure its parameters is shown to the user. A basic form validation is done for the case that a selected value for a parameter is not within the specified boundaries. In this point, the final version of the Algorithm Execution Container includes the option to set a grid of parameters, so that the algorithm may run within those spans, and it automatically selects the best model based on the results produced. A preview of the configuration of the grid parameters is shown in the next figure.

The model that will be created when an algorithm is applied is saved in the user’s datasets, in a folder specified in the last container tab, as shown below:

Once this last step is concluded, by pressing the “Apply” button, the user is taken to the Overview Tab:

There, by pressing Done, the algorithm is applied and when the execution is completed, some algorithm-dependent results are provided to the user in the same tab (next figure). The final output of the analysis is stored back in the AEGIS Data Store, while the model that was used for the analysis, is also stored alongside with the analysis results.

In case the analysis that has been selected is of a classification or a regression, the model that has been created in the previous steps can be re-applied on new datasets, using the “Apply Model” tab, provided that the format of the input file is same as that of the one used to generate the model. This feature is shown in the next figure, where the user is able to select from existing files and re-apply the model.

API

No API is available, since the Algorithm Execution Container is a Zeppelin notebook.


Deprecated Zeppelin version Deployment

The Algorithm Execution Container is implemented with Python 2.7 as a Zeppelin notebook. Here, we describe how the tool can be utilized from the AEGIS platform.

STEP 1 - Install the notebook

If the Algorithm Execution Container Zeppelin notebook does not already exist in a specific project, one should first download it in its JSON description.

Once downloaded, one should import the .JSON file to an empty Zeppelin notebook (by selecting the "Import Note" option).

Once this is done, a new "AEGIS" notebook is created in your project's folder.

STEP 2 - Create a DataSet folder

You need to create a dataset folder with public permissions in your AEGIS project, in order for the AEC to be able to export its results there.

STEP 3 - Open the notebook

Open the notebook by clicking at the new "AEGIS" notebook that has appeared in your Notebook's folder in the AEGIS platform.

STEP 4 - Select the spark interpret

By clicking the Settings icon (gear) in the top right of the notebook, you are able to drag the spark interpreter as the first to be triggered.

STEP 5 - Initialise the NoteBook (only once)

For the initialisation of the notebook, run the paragraph called "Algorithm Initialisation"

API

No API is available, since the Algorithm Execution Container is a Zeppelin notebook.