Using the BDC Workflow CLI Tool - lbl-anp/berkeley-data-cloud GitHub Wiki
The BDC Workflow CLI a command-line interface for users to work with BDC workflows.
Run the bdc-auth tool to setup authentication for the bdc-workflow tool.
sudo bdc-auth --username superuser --export /vagrant/bdc-auth_script.sh --host localhost (use your superuser password when prompted) sudo chmod 777 /vagrant/bdc-auth_script.sh source /vagrant/bdc-auth_script.sh
You may view a list of directories, workflows, processes or datasets accessible to you via the following commands.
bdc-workflow list directory bdc-workflow list workflow
For the next 2 commands, there will be a large number of processes and datasets output as there are processes and datasets used for internal testing that are still on the VM's database.
bdc-workflow list process bdc-workflow list dataset
To display group or user access on a BDC entity:
bdc-workflow access --action list --type user --id /Global/LBL/KUT workflow bdc-workflow access --action list --type group --id /Global/LBL/KUT workflow
bdc-workflow show --id /Global/LBL/KUT workflow ID : /Global/LBL/KUT Owner : superuser Description : Sample RSL workflow. Directory : /Global/LBL Processes : AnomSensor<0.1> KUTRates<0.1> bdc-workflow show --id /Global/LBL directory ID : /Global/LBL Owner : superuser Directory Name : LBL Parent Directory : /Global
For the next command, we need quotes for the "--id" field to avoid errors when typing "<" and ">" on the Linux command line.
bdc-workflow show --id "/Global/LBL/KUT/KUTRates<0.1>" process ID : /Global/LBL/KUT/KUTRates<0.1> Owner : superuser Inputs : DATASET PARENT PROCESS /ARES/RSIRadDetector/System/Spectrum /Global/Global/rad_to_hdf5<0.0> /ARES/RSIRadDetector/System/Timestamps /Global/Global/rad_to_hdf5<0.0> Outputs : /ARES/LBL/KUT/KGross /ARES/LBL/KUT/KNet /ARES/LBL/KUT/Timestamps /ARES/LBL/KUT/TlGross /ARES/LBL/KUT/TlNet /ARES/LBL/KUT/TotalGrossCounts /ARES/LBL/KUT/UGross /ARES/LBL/KUT/UNet
Attempt to create a new directory:
bdc-workflow create directory --parent /Global/LBL --name TestDirectory ******************************************* Directory created successfully. ******************************************* bdc-workflow create workflow --workflow_directory /Global/LBL/TestDirectory --name TestWorkflow --description "My Test Workflow" ******************************************* Workflow created successfully. *******************************************
Now try to delete the created directory and see what happens:
bdc-workflow delete --id /Global/LBL/TestDirectory directory ******************************************* Data service encountered an error: Operation not allowed. Possible reasons for this failure : -The entity you are attempting to delete has child entities. Recursive deletes are not supported. -The entity you hare attempting to delete has registered data associated with it. Deletion will cause data corruption. -You are not allowed write access on the parent object and may not delete a child object. *******************************************
Notice how this fails as the directory contains a child object (the test workflow we created). We need to delete the workflow permissions and the workflow first:
bdc-workflow delete --id /Global/LBL/TestDirectory/TestWorkflow workflow ******************************************* Workflow deleted successfully. *******************************************
And then successfully delete the directory:
bdc-workflow delete --id /Global/LBL/TestDirectory directory ******************************************* Directory deleted successfully. *******************************************
When processes and workflows are created, they're created without permissions. To set user/group permissions, ensure that you use the action 'set' and pass a legal permissions string, for example:
bdc-workflow access --action set --type user --target test_lbnl --permissions rw --id /Global/LBL/TestDirectory/TestWorkflow workflow ******************************************* Workflow permission set successfully. ******************************************* bdc-workflow access --action set --type group --target SAIC2 --permissions rw --id /Global/LBL/TestDirectory/TestWorkflow workflow ******************************************* Workflow permission set successfully. *******************************************
Listing the effects of the previous 2 commands, we can see:
bdc-workflow access --action list --type user --id /Global/LBL/TestDirectory/TestWorkflow workflow ID : /Global/LBL/TestWorkflow Type : User Permissions : NAME PERMISSIONS superuser rwx test_lbnl rw bdc-workflow access --action list --type group --id /Global/LBL/TestDirectory/TestWorkflow workflow ID : /Global/LBL/TestWorkflow Type : Group Permissions : NAME PERMISSIONS SAIC2 rw
To delete access, use the access command and specify the entity type as in:
[vagrant@vagrant vagrant]$ bdc-workflow access --id "/Global/LBL/SourceInjectSample/SourceInjectSampleProcess<2.0>" --type g --action delete --target GOVTEAM process ******************************************* Process permissions deleted successfully. *******************************************
To view a group, use the show command :
[vagrant@vagrant vagrant]$ bdc-workflow show --id LBNL group ID : LBNL Description : LBNL Users Users : test_lbnl
You may update a group by modifying its description or taking out and adding users. To remove a user, add '~' before their username. To add in a user, simply write their username in the list.
The following example shows the test_saic user removed and the superuser added to this group and the description updated:
[vagrant@vagrant unit]$ bdc-workflow update-group --id SAIC2 --description "This New Description" --users ~test_saic superuser ******************************************* Group updated successfully. ******************************************* [vagrant@vagrant unit]$ bdc-workflow show --id SAIC2 group ID : SAIC2 Description : This New Description Users : superuser
To create a group, use the create command :
[vagrant@vagrant unit]$ bdc-workflow create group --id LBNL-And-Admin --description "LBNL and admin sample group." --users superuser test_lbnl ******************************************* Group created successfully. ******************************************* [vagrant@vagrant unit]$ bdc-workflow show --id LBNL-And-Admin group ID : LBNL-And-Admin Description : LBNL and admin sample group. Users : superuser test_lbnl
To delete a group, use the delete command. If your group is already associated with other entities (e.g: it is given process/workflow/directory access), you will need to delete these first.
[vagrant@vagrant unit]$ bdc-workflow delete --id LBNL-And-Admin group ******************************************* Group deleted successfully. *******************************************
In all cases of creating datasets, you need to create timesync datasets first. Otherwise the workflow service will throw an error.
For the case of creating them in single mode (i.e: one at a time) you will obviously have to manage that manually. In case of using batch mode (i.e: using a csv file), this will be managed for you by the CLI which will create all time-sync datasets first.
To create datasets in single mode, use the create command as follows:
[vagrant@vagrant unit]$ bdc-workflow create dataset --group "/ExampleGroup" --name "/ExampleTimeSyncDataset" --source "/ARES/LBL" --description "An example timesync dataset." --type double --dimensions 1 --level 3 --keywords "Some,Sample,Keywords" --is_time_sync 1 ******************************************* Dataset created successfully. ******************************************* bdc-workflow show --id "/ARES/LBL/ExampleGroup/ExampleTimeSyncDataset" dataset ID : /ARES/LBL/ExampleGroup/ExampleTimeSyncDataset Creator : superuser Description : An example timesync dataset. Keywords : Some,Sample,Keywords HDF5 Location : /ARES/LBL/ExampleGroup/ExampleTimeSyncDataset Data Type : double Data Dimensions : 1 Indexable : True Data Source : /ARES/LBL
Note that we can now create other datasets associated with this time-sync dataset. For example, creating a 1024x92 2-D dataset:
[vagrant@vagrant vagrant]$ bdc-workflow create dataset --group "/ExampleGroup" --name "/ExampleDataset" --source "/ARES/LBL" --description "An example dataset." --type int --dimensions 1024 92 --level 3 --keywords "Some,Sample,Keywords" --is_time_sync 0 ******************************************* Dataset created successfully. ******************************************* [vagrant@vagrant vagrant]$ bdc-workflow show --id "/ARES/LBL/ExampleGroup/ExampleDataset" dataset ID : /ARES/LBL/ExampleGroup/ExampleDataset Creator : superuser Description : An example dataset. Keywords : Some,Sample,Keywords HDF5 Location : /ARES/LBL/ExampleGroup/ExampleDataset Data Type : int Data Dimensions : 1024x92 Indexable : True Data Source : /ARES/LBL
To create a spatially synced dataset, make sure you associate it with a time-sync dataset that is spatially synced (i.e: with a spatially synced data group). For example:
[vagrant@vagrant unit]$ bdc-workflow create dataset --group "/ExampleGroupWithSpatial" --name "/ExampleSpatialSyncDataset" --source "/ARES/HeliSORDS/LBL" --description "An example of a spatially synced dataset." --type int --dimensions 10 --level 3 --keywords "Some,Sample,Keywords" --is_time_sync 1 --xsync "/ARES/RSIRadDetector/MCSC/Alarms/Longitude" --ysync "/ARES/RSIRadDetector/MCSC/Alarms/Latitude" --zsync "/ARES/RSIRadDetector/MCSC/Alarms/Altitude" ******************************************* Dataset created successfully. *******************************************
The batch mode dataset creation requires an input of a csv file in the following format:
name,group,source,dimensions,processing_level,type,is_time_sync,xsync,ysync,zsync,keywords,description /Timestamps,/KUT,/ARES/LBL,1,3,double,1,,,,, /TotalGrossCounts,/KUT,/ARES/LBL,1,3,double,0,,,,, /KGross,/KUT,/ARES/LBL,1,3,double,0,,,,, /KNet,/KUT,/ARES/LBL,1,3,double,0,,,,, /UGross,/KUT,/ARES/LBL,1,3,double,0,,,,,
A full example file may be viewed on the VM @ /vagrant/resources/workflow/ares/RSLExamples_dataset.csv.
To create datasets in batch mode, run the following command :
[vagrant@vagrant unit]$ bdc-workflow create dataset-batch --csv /vagrant/resources/workflow/ares/RSLExamples_file_path.csv Processing dataset /ARES/LBL/KUT/Timestamps from csv Processing dataset /ARES/LBL/AnomSensor/Timestamps from csv Processing dataset /ARES/LBL/KUT/TlGross from csv Processing dataset /ARES/LBL/KUT/TlNet from csv Processing dataset /ARES/LBL/KUT/UNet from csv ... Created dataset /Timestamps successfully Created dataset /Timestamps successfully Created dataset /TlGross successfully Created dataset /TlNet successfully Created dataset /UNet successfully ...
The dataset definitions are first processed [i.e: checked for obvious errors or mistakes] then registered one at a time. If there is a problem, the CLI will NOT roll back. You will need to update the csv file to remove the already created dataset and re-run.
To delete a dataset, use the delete command and pass in the full dataset path. Note that you cannot delete a time-sync dataset if other datasets depend on it. You may not delete a dataset if it's associated with a process or data. Because the dataset we just created was not yet associated with any process, we can delete it as follows:
[vagrant@vagrant vagrant]$ bdc-workflow delete --id "/ARES/HeliSORDS/LBL/ExampleGroupWithSpatial/ExampleSpatialSyncDataset" dataset ******************************************* Dataset deleted successfully. *******************************************
Make sure you pass in the correct number of inputs, outputs and parent datasets (1 parent for each input). Ensure the version (1.0 in the example below) is passed correctly in the ID field as it is required.
[vagrant@vagrant unit]$ bdc-workflow create process --id "/Global/LBL/KUT/SampleProcess<1.0>" --inputs /ARES/LBL/KUT/Timestamps /ARES/LBL/KUT/UGross --outputs /ARES/LBL/AnomSensor/SampleDataset /ARES/LBL/AnomSensor/Timestamps --parents "/Global/LBL/KUT/KUTRates<0.1>" "/Global/LBL/KUT/KUTRates<0.1>" ******************************************* Process created successfully. ******************************************* [vagrant@vagrant unit]$ bdc-workflow show --id "/Global/LBL/KUT/SampleProcess<1.0>" process ID : /Global/LBL/KUT/SampleProcess<1.0> Owner : superuser Inputs : DATASET PARENT PROCESS /ARES/LBL/KUT/Timestamps /Global/LBL/KUT/KUTRates<0.1> /ARES/LBL/KUT/UGross /Global/LBL/KUT/KUTRates<0.1> Outputs : /ARES/LBL/AnomSensor/SampleDataset /ARES/LBL/AnomSensor/Timestamps
To create processes in batch mode, arrange the processes such that parent processes are put FIRST in the file followed by processes that may depend on them.
The workflow CLI will not figure this out on its own. An example of a csv batch file for process creation can be found on the VM @ /vagrant/resources/workflow/ares/RSLExamples_process_1.csv. To register processes via batch mode, use the create process-batch command. A slightly modified version of the RSLExamples_process_1.csv file (where all process versions are 0.2 instead of 0.1) is shown below as an example:
[vagrant@vagrant vagrant]$ bdc-workflow create process-batch --csv resources/workflow/ares/RSLExamples_process_1.csv Attempting batch process registration from csv file. Created process /Global/LBL/KUT/KUTRates<0.2> successfully Created process /Global/LBL/KUT/AnomSensor<0.2> successfully [vagrant@vagrant vagrant]$ bdc-workflow show --id "/Global/LBL/KUT/KUTRates<0.2>" process ID : /Global/LBL/KUT/KUTRates<0.2> Owner : superuser Inputs : DATASET PARENT PROCESS /ARES/RSIRadDetector/System/Path/GPSAltitude /Global/Global/rad_to_hdf5<0.0> /ARES/RSIRadDetector/System/Path/Latitude /Global/Global/rad_to_hdf5<0.0> /ARES/RSIRadDetector/System/Path/Longitude /Global/Global/rad_to_hdf5<0.0> /ARES/RSIRadDetector/System/Spectrum /Global/Global/rad_to_hdf5<0.0> /ARES/RSIRadDetector/System/Timestamps /Global/Global/rad_to_hdf5<0.0> Outputs : /ARES/LBL/KUT/KGross /ARES/LBL/KUT/KNet /ARES/LBL/KUT/Timestamps /ARES/LBL/KUT/TlGross /ARES/LBL/KUT/TlNet /ARES/LBL/KUT/TotalGrossCounts /ARES/LBL/KUT/UGross /ARES/LBL/KUT/UNet [vagrant@vagrant vagrant]$ bdc-workflow show --id "/Global/LBL/KUT/AnomSensor<0.2>" process ID : /Global/LBL/KUT/AnomSensor<0.2> Owner : superuser Inputs : DATASET PARENT PROCESS /ARES/LBL/KUT/Timestamps /Global/LBL/KUT/KUTRates<0.1> /ARES/LBL/KUT/UGross /Global/LBL/KUT/KUTRates<0.1> Outputs : /ARES/LBL/AnomSensor/SampleDataset /ARES/LBL/AnomSensor/Timestamps
Again, the mechanism here is the same as in dataset batch mode. The CLI will attempt to create them one at a time aborting and reporting if it fails. If that happens, make sure you update the csv file accordingly (removing any already registered processes) and then re-run.
Make sure the process is not parent to any other processes and is not associated with data. Otherwise you will get an error.
[vagrant@vagrant unit]$ bdc-workflow delete --id "/Global/LBL/KUT/AnomSensor<0.2> process ******************************************* Process deleted successfully. *******************************************