VIP executions source code - fli-iam/shanoir-ng GitHub Wiki
Here will be presented the Shanoir source code relative to VIP executions management.
The code is mostly located in shanoir-ng-datasets/src/main/java/org/shanoir/ng/vip, except for shanoir-ng-datasets/src/main/java/org/shanoir/ng/processing which can be considered as the processing Shanoir object named DatasetProcessing (meaning that it's linked to other Shanoir objects as datasets, it's stored in DB, it can be displayed etc ...). The output return from VIP is made in the import-ms, path is shanoir-ng-import/src/main/java/or/shanoir/ng/import/vip.
Text below is split by Java object. Notes on methods and method directories are not exhaustive, as methods are mostly commented in source code, and can be kinda intuitive. Are only presented methods and concepts that might be non intuitive according to me. Some others points :
- DatasetProcessing and Execution can be associated to a same and unique concept, if a difference has to be raised, Processing would be relative to the front and Execution to the back of the Shanoir app
- Execution HTTP requests can be mentionning a Service Account. VIP wants to authenticate the execution HTTP requests, but they are not authenticated by Shanoir users, but only by a KeyCloak Service Account which is generic for all users.
- ExecutionMonitoring are stored in /vip directory although they are inheriting DatasetProcessing. This structure is inherited from first version. While executions are not finished successfully (for example, killed executions are not considered as finished successfully), they have an ExecutionMonitoring as parent. That parent is removed as soon as executions are successfully finished (the child is removed and created again).
- Executions are made parallel according to a specific max_thread number. It's has been determined with VIP that 3 is a suitable number.
- Currently, if executions environment did not crashed, VIP returns systematically a .zip archive which can be empty sometimes, and the executions status is 'Finished', whatever happened in the pipe. If there was an issue out of the pipe, the VIP environment returns nothing, and the executions is never considered as 'Finished' in Shanoir environment (so the parent ExecutionMonitoring is not removed).
- A future dev is about to standardize the VIP returns. Whatever the reason of an execution failure, and wherever in or out of a pipe, VIP will return an 'Execution_Failed' status, but no .zip archive. A short message explaining the crash reason will be integrated in the VIP HTTP response. Whatever the case, logs of the executions will be requested and stored in Shanoir.
- Be aware that a Python client for serializing executions is available in Pynoir git repository.
Dataset-ms
DatasetProcessing (Displayable Shanoir object)
Displayable Shanoir object, accessible with Shanoir UI, which gathers all usefull datas relative to an execution for a user.
- Controller
Usual CRUD methods, with possibility to download processing inputs and outputs. Some methods returning DTOs are for front.
- DTO
ParameterResourceDTO are relative to parameters needed for launch executions. They can be chosen with UI execution launching and in JSON_generator for the Python client, but are not stored in DB. GroupByEnum is relative a grouping parameter.
- Model
DatasetProcessingType lists all kind of possible executions. No idea of where it's coming from.
- Service
DatasetProcessingService.removeDatasetFromAllProcessingInput() : Required for dataset deletion, to avoid DB relation issue. ProcessingDownloaderService.manageResultOnly() : filter what you want to download. More specific filtering is done according to the value. "false" : not filtering, "true", removes input from download, "any_regex", removes input and filters only output named as regex parameter value.
Execution
Gathers back methods for communication with VIP about execution launching and tracking.
- DTO
Difference between ExecutionCandidateDTO and VipExecutionDTO is that ExecutionCandidates are relative to processing that are not launched in VIP yet, although VipExecutions have been already launched in VIP and are returned by VIP API.
- Service
There a some service methods working on a tracking file. There is one tracking file per type of executions, and it's stored at /var/datasets-data/vip-data/ in the datasets container. They references all executions requests, from front as from python script, and are updated as the executions in VIP progress.
Execution Monitoring
Used to monitor an execution by querying periodically its VIP status, and process the output returned by VIP if there is one.
- Model
ExecutionStatus references all the status which can be returned by VIP.
- Service
ExecutionMonitoringResumptionRunner is a Spring component run at the launch of the Shanoir dataset microservice that revives all interrupted executions in Shanoir sight. The executions have been oftenly not interrupted in VIP infra, but their status and their output need to be requested by Shanoir.
Output
Those are called by ExecutionMonitoring and used for managing executions outputs.
- Handler
Handlers are used for applying post processing using executions outputs, like objects renaming.
DefaultHandler is used whatever the execution type and stores the outputs files in Shanoir Server. Location is /var/datasets-data/processed-dataset. Specific handlers, as OFSEPSeqIdHandler, are used for applying processing on Shanoir datas depending on the execution type. There can be one handler per execution type.
- Service
OutputService extracts the output files from the returned by VIP .zip archive, and calls appropriated handlers.
Path
It's an API used by VIP's Carmin API for downloading dataset resources into VIP environment. I'm not aware of why the different cases of the switch cases exist.
Pipeline
It's an API used for knowing which pipelines are available in VIP environment.
ProcessingResource
Those files are relative to ProcessingResource object generation. Codes that are relative to specific datasets are created, and facilitate download of datasets in VIP environment.
Shared
Some shared resources, that are usefull for multiple objects or that does not deserve a whole directory.
Import-ms
Controller
Those methods are only called by VIP Carmin API. They are more or less the other way around the dataset-ms/path directory, they are usefull for importing results in Shanoir, or deleting that result storage.
Model
I don't really know what's the point of those models.