Using the BDC Workflow CLI Tool - lbl-anp/berkeley-data-cloud GitHub Wiki

Workflow CLI Get Started Guide

Overview

The BDC Workflow CLI a command-line interface for users to work with BDC workflows.

Instruction to Use BDC Workflow CLI

Set up environment

Run the bdc-auth tool to setup authentication for the bdc-workflow tool.

sudo bdc-auth --username superuser --export /vagrant/bdc-auth_script.sh --host localhost
       (use your superuser password when prompted)
sudo chmod 777 /vagrant/bdc-auth_script.sh
source /vagrant/bdc-auth_script.sh

View workflows or processes

You may view a list of directories, workflows, processes or datasets accessible to you via the following commands.

bdc-workflow list directory 
bdc-workflow list workflow

For the next 2 commands, there will be a large number of processes and datasets output as there are processes and datasets used for internal testing that are still on the VM's database.

bdc-workflow list process
bdc-workflow list dataset

To display group or user access on a BDC entity:

bdc-workflow access --action list --type user --id /Global/LBL/KUT workflow
bdc-workflow access --action list --type group --id /Global/LBL/KUT workflow

Show workflow or processes or datasets:

bdc-workflow show --id /Global/LBL/KUT workflow

ID :                	/Global/LBL/KUT
Owner :             	superuser
Description :       	Sample RSL workflow.
Directory :         	/Global/LBL
Processes :         	AnomSensor<0.1>
                    	KUTRates<0.1>

bdc-workflow show --id /Global/LBL directory

ID :               	/Global/LBL
Owner :            	superuser
Directory Name :   	LBL
Parent Directory : 	/Global 

For the next command, we need quotes for the "--id" field to avoid errors when typing "<" and ">" on the Linux command line.

bdc-workflow show --id "/Global/LBL/KUT/KUTRates<0.1>" process

ID : /Global/LBL/KUT/KUTRates<0.1>
Owner : superuser
Inputs : DATASET                                      PARENT PROCESS
         /ARES/RSIRadDetector/System/Spectrum         /Global/Global/rad_to_hdf5<0.0>
         /ARES/RSIRadDetector/System/Timestamps       /Global/Global/rad_to_hdf5<0.0>
Outputs :
         /ARES/LBL/KUT/KGross
         /ARES/LBL/KUT/KNet
         /ARES/LBL/KUT/Timestamps
         /ARES/LBL/KUT/TlGross
         /ARES/LBL/KUT/TlNet
         /ARES/LBL/KUT/TotalGrossCounts
         /ARES/LBL/KUT/UGross
         /ARES/LBL/KUT/UNet   

Create workflows and workflow directory

Attempt to create a new directory:

bdc-workflow create directory --parent /Global/LBL --name TestDirectory
*******************************************
	Directory created successfully.
*******************************************

bdc-workflow create workflow --workflow_directory /Global/LBL/TestDirectory --name TestWorkflow --description "My Test Workflow"
*******************************************
	Workflow created successfully.
*******************************************

Now try to delete the created directory and see what happens:

bdc-workflow delete --id /Global/LBL/TestDirectory directory
*******************************************

Data service encountered an error:
	Operation not allowed.

Possible reasons for this failure :
	-The entity you are attempting to delete has child entities. Recursive deletes are not supported.
	-The entity you hare attempting to delete has registered data associated with it.
	 Deletion will cause data corruption.
	-You are not allowed write access on the parent object and may not delete a child object.

*******************************************

Notice how this fails as the directory contains a child object (the test workflow we created). We need to delete the workflow permissions and the workflow first:

bdc-workflow delete --id /Global/LBL/TestDirectory/TestWorkflow workflow
*******************************************
	Workflow deleted successfully.
*******************************************

And then successfully delete the directory:

bdc-workflow delete --id /Global/LBL/TestDirectory directory
*******************************************
	Directory deleted successfully.
*******************************************

Setting and deleting permissions on a BDC entity:

When processes and workflows are created, they're created without permissions. To set user/group permissions, ensure that you use the action 'set' and pass a legal permissions string, for example:

 
bdc-workflow access --action set --type user --target test_lbnl --permissions rw --id /Global/LBL/TestDirectory/TestWorkflow workflow
*******************************************
	Workflow permission set successfully.
*******************************************

bdc-workflow access --action set --type group --target SAIC2 --permissions rw --id /Global/LBL/TestDirectory/TestWorkflow workflow
*******************************************
	Workflow permission set successfully.
*******************************************

Listing the effects of the previous 2 commands, we can see:

bdc-workflow access --action list --type user --id /Global/LBL/TestDirectory/TestWorkflow workflow

ID : /Global/LBL/TestWorkflow                               
Type : User                                                  
Permissions : NAME             	PERMISSIONS                  
              superuser        	rwx                          
              test_lbnl        	rw    

bdc-workflow access --action list --type group --id /Global/LBL/TestDirectory/TestWorkflow workflow

ID : /Global/LBL/TestWorkflow                               
Type : Group                                                 
Permissions : NAME             	PERMISSIONS                  
              SAIC2            	rw   

To delete access, use the access command and specify the entity type as in:

[vagrant@vagrant vagrant]$ bdc-workflow access --id "/Global/LBL/SourceInjectSample/SourceInjectSampleProcess<2.0>" --type g --action delete --target GOVTEAM process
*******************************************
	Process permissions deleted successfully.
*******************************************

Manipulate user groups via the workflow CLI:

To view a group, use the show command :

[vagrant@vagrant vagrant]$ bdc-workflow show --id LBNL group

ID          : LBNL
Description : LBNL Users
Users       : test_lbnl

You may update a group by modifying its description or taking out and adding users. To remove a user, add '~' before their username. To add in a user, simply write their username in the list.

The following example shows the test_saic user removed and the superuser added to this group and the description updated:

[vagrant@vagrant unit]$ bdc-workflow update-group --id SAIC2 --description "This New Description" --users ~test_saic superuser
*******************************************
	Group updated successfully.
*******************************************
[vagrant@vagrant unit]$ bdc-workflow show --id SAIC2 group

ID          : SAIC2                                                 
Description : This New Description                                   
Users       : superuser   

To create a group, use the create command :

[vagrant@vagrant unit]$ bdc-workflow create group --id LBNL-And-Admin --description "LBNL and admin sample group." --users superuser test_lbnl
*******************************************
	Group created successfully.
*******************************************

[vagrant@vagrant unit]$ bdc-workflow show --id LBNL-And-Admin group

ID          : LBNL-And-Admin                                                        
Description : LBNL and admin sample group.                                           
Users       : superuser                                                              
              test_lbnl                                                              

To delete a group, use the delete command. If your group is already associated with other entities (e.g: it is given process/workflow/directory access), you will need to delete these first.

[vagrant@vagrant unit]$ bdc-workflow delete --id LBNL-And-Admin group
*******************************************
	Group deleted successfully.
*******************************************

Creating and deleting datasets:

In all cases of creating datasets, you need to create timesync datasets first. Otherwise the workflow service will throw an error.

For the case of creating them in single mode (i.e: one at a time) you will obviously have to manage that manually. In case of using batch mode (i.e: using a csv file), this will be managed for you by the CLI which will create all time-sync datasets first.

Create datasets in single mode:

To create datasets in single mode, use the create command as follows:

[vagrant@vagrant unit]$ bdc-workflow create dataset --group "/ExampleGroup" --name "/ExampleTimeSyncDataset" --source "/ARES/LBL" --description "An example timesync dataset." --type double --dimensions 1 --level 3 --keywords "Some,Sample,Keywords" --is_time_sync 1
*******************************************
	Dataset created successfully.
*******************************************

bdc-workflow show --id "/ARES/LBL/ExampleGroup/ExampleTimeSyncDataset" dataset

ID :                                         	/ARES/LBL/ExampleGroup/ExampleTimeSyncDataset
Creator :                                    	superuser
Description :                                	An example timesync dataset.
Keywords :                                   	Some,Sample,Keywords
HDF5 Location :                              	/ARES/LBL/ExampleGroup/ExampleTimeSyncDataset
Data Type :                                  	double
Data Dimensions :                            	1
Indexable :                                  	True
Data Source :                                	/ARES/LBL

Note that we can now create other datasets associated with this time-sync dataset. For example, creating a 1024x92 2-D dataset:

[vagrant@vagrant vagrant]$ bdc-workflow create dataset --group "/ExampleGroup" --name "/ExampleDataset" --source "/ARES/LBL" --description "An example dataset." --type int --dimensions 1024 92 --level 3 --keywords "Some,Sample,Keywords" --is_time_sync 0
*******************************************
	Dataset created successfully.
*******************************************
[vagrant@vagrant vagrant]$ bdc-workflow show --id "/ARES/LBL/ExampleGroup/ExampleDataset" dataset

ID :                                 	/ARES/LBL/ExampleGroup/ExampleDataset
Creator :                            	superuser
Description :                        	An example dataset.
Keywords :                           	Some,Sample,Keywords
HDF5 Location :                      	/ARES/LBL/ExampleGroup/ExampleDataset
Data Type :                          	int
Data Dimensions :                    	1024x92
Indexable :                          	True
Data Source :                        	/ARES/LBL

To create a spatially synced dataset, make sure you associate it with a time-sync dataset that is spatially synced (i.e: with a spatially synced data group). For example:

[vagrant@vagrant unit]$ bdc-workflow create dataset --group "/ExampleGroupWithSpatial" --name "/ExampleSpatialSyncDataset" --source "/ARES/HeliSORDS/LBL" --description "An example of a spatially synced dataset." --type int --dimensions 10 --level 3 --keywords "Some,Sample,Keywords" --is_time_sync 1 --xsync "/ARES/RSIRadDetector/MCSC/Alarms/Longitude" --ysync "/ARES/RSIRadDetector/MCSC/Alarms/Latitude" --zsync "/ARES/RSIRadDetector/MCSC/Alarms/Altitude"
*******************************************
	Dataset created successfully.
*******************************************

Create datasets in batch mode:

The batch mode dataset creation requires an input of a csv file in the following format:

name,group,source,dimensions,processing_level,type,is_time_sync,xsync,ysync,zsync,keywords,description
/Timestamps,/KUT,/ARES/LBL,1,3,double,1,,,,,
/TotalGrossCounts,/KUT,/ARES/LBL,1,3,double,0,,,,,
/KGross,/KUT,/ARES/LBL,1,3,double,0,,,,,
/KNet,/KUT,/ARES/LBL,1,3,double,0,,,,,
/UGross,/KUT,/ARES/LBL,1,3,double,0,,,,,

A full example file may be viewed on the VM @ /vagrant/resources/workflow/ares/RSLExamples_dataset.csv.

To create datasets in batch mode, run the following command :

[vagrant@vagrant unit]$ bdc-workflow create dataset-batch --csv /vagrant/resources/workflow/ares/RSLExamples_file_path.csv
Processing dataset /ARES/LBL/KUT/Timestamps from csv
Processing dataset /ARES/LBL/AnomSensor/Timestamps from csv
Processing dataset /ARES/LBL/KUT/TlGross from csv
Processing dataset /ARES/LBL/KUT/TlNet from csv
Processing dataset /ARES/LBL/KUT/UNet from csv
...
Created dataset /Timestamps successfully
Created dataset /Timestamps successfully
Created dataset /TlGross successfully
Created dataset /TlNet successfully
Created dataset /UNet successfully
...

The dataset definitions are first processed [i.e: checked for obvious errors or mistakes] then registered one at a time. If there is a problem, the CLI will NOT roll back. You will need to update the csv file to remove the already created dataset and re-run.

Deleting a dataset :

To delete a dataset, use the delete command and pass in the full dataset path. Note that you cannot delete a time-sync dataset if other datasets depend on it. You may not delete a dataset if it's associated with a process or data. Because the dataset we just created was not yet associated with any process, we can delete it as follows:

[vagrant@vagrant vagrant]$ bdc-workflow delete --id "/ARES/HeliSORDS/LBL/ExampleGroupWithSpatial/ExampleSpatialSyncDataset" dataset
*******************************************
	Dataset deleted successfully.
*******************************************

Creating and deleting processes :

Create process in single mode:

Make sure you pass in the correct number of inputs, outputs and parent datasets (1 parent for each input). Ensure the version (1.0 in the example below) is passed correctly in the ID field as it is required.

[vagrant@vagrant unit]$ bdc-workflow create process --id "/Global/LBL/KUT/SampleProcess<1.0>" --inputs /ARES/LBL/KUT/Timestamps /ARES/LBL/KUT/UGross --outputs /ARES/LBL/AnomSensor/SampleDataset /ARES/LBL/AnomSensor/Timestamps --parents "/Global/LBL/KUT/KUTRates<0.1>" "/Global/LBL/KUT/KUTRates<0.1>" 
*******************************************
	Process created successfully.
*******************************************
[vagrant@vagrant unit]$ bdc-workflow show --id "/Global/LBL/KUT/SampleProcess<1.0>" process

ID : /Global/LBL/KUT/SampleProcess<1.0>
Owner : superuser
Inputs : DATASET                            PARENT PROCESS
         /ARES/LBL/KUT/Timestamps           /Global/LBL/KUT/KUTRates<0.1>
         /ARES/LBL/KUT/UGross               /Global/LBL/KUT/KUTRates<0.1>
Outputs :
         /ARES/LBL/AnomSensor/SampleDataset
         /ARES/LBL/AnomSensor/Timestamps                                                                        

Create processes in batch mode:

To create processes in batch mode, arrange the processes such that parent processes are put FIRST in the file followed by processes that may depend on them.

The workflow CLI will not figure this out on its own. An example of a csv batch file for process creation can be found on the VM @ /vagrant/resources/workflow/ares/RSLExamples_process_1.csv. To register processes via batch mode, use the create process-batch command. A slightly modified version of the RSLExamples_process_1.csv file (where all process versions are 0.2 instead of 0.1) is shown below as an example:

[vagrant@vagrant vagrant]$ bdc-workflow create process-batch --csv resources/workflow/ares/RSLExamples_process_1.csv
Attempting batch process registration from csv file.
Created process /Global/LBL/KUT/KUTRates<0.2> successfully
Created process /Global/LBL/KUT/AnomSensor<0.2> successfully
[vagrant@vagrant vagrant]$ bdc-workflow show --id "/Global/LBL/KUT/KUTRates<0.2>" process

ID : /Global/LBL/KUT/KUTRates<0.2>
Owner : superuser
Inputs : DATASET                                      PARENT PROCESS
         /ARES/RSIRadDetector/System/Path/GPSAltitude /Global/Global/rad_to_hdf5<0.0>
         /ARES/RSIRadDetector/System/Path/Latitude    /Global/Global/rad_to_hdf5<0.0>
         /ARES/RSIRadDetector/System/Path/Longitude   /Global/Global/rad_to_hdf5<0.0>
         /ARES/RSIRadDetector/System/Spectrum         /Global/Global/rad_to_hdf5<0.0>
         /ARES/RSIRadDetector/System/Timestamps       /Global/Global/rad_to_hdf5<0.0>
Outputs :
         /ARES/LBL/KUT/KGross
         /ARES/LBL/KUT/KNet
         /ARES/LBL/KUT/Timestamps
         /ARES/LBL/KUT/TlGross
         /ARES/LBL/KUT/TlNet
         /ARES/LBL/KUT/TotalGrossCounts
         /ARES/LBL/KUT/UGross
         /ARES/LBL/KUT/UNet

[vagrant@vagrant vagrant]$ bdc-workflow show --id "/Global/LBL/KUT/AnomSensor<0.2>" process

ID : /Global/LBL/KUT/AnomSensor<0.2>
Owner : superuser
Inputs : DATASET                            PARENT PROCESS
         /ARES/LBL/KUT/Timestamps           /Global/LBL/KUT/KUTRates<0.1>
         /ARES/LBL/KUT/UGross               /Global/LBL/KUT/KUTRates<0.1>
Outputs :
         /ARES/LBL/AnomSensor/SampleDataset
         /ARES/LBL/AnomSensor/Timestamps

Again, the mechanism here is the same as in dataset batch mode. The CLI will attempt to create them one at a time aborting and reporting if it fails. If that happens, make sure you update the csv file accordingly (removing any already registered processes) and then re-run.

Deleting a process:

Make sure the process is not parent to any other processes and is not associated with data. Otherwise you will get an error.

[vagrant@vagrant unit]$ bdc-workflow delete --id "/Global/LBL/KUT/AnomSensor<0.2> process
*******************************************
	Process deleted successfully.
*******************************************
⚠️ **GitHub.com Fallback** ⚠️