User Tutorial : distpy on linux - Schlumberger/distpy GitHub Wiki

Creating thumbnail plots on a Linux Cluster

Python can be installed in many different ways for Linux Clusters and your local IT team can probably advise if your system differs to this example.

In this example we are going to assume that we can create custom environments using conda, and that job submission to the cluster is managed by the slurm workload manager.

To create a python environment (adjusting directory names as needed):

conda create -p /scratch/myusername/python_for_distpy python=3.7

Activate the environment on the head node using bash, so that distpy can be installed:

source activate /scratch/myusername/python_for_distpy
pip install distpy

This environment is now available to activate from within our slurm script. Which contains something like:

#!/bin/sh
#SBATCH -n 64

source activate /scratch/myusername/python_for_distpy
python CASE00.py

Here the control script is again CASE00.py, to adjust it for out case we first set the data archive and project drives to be appropriate for Linux (noting that your own drive locations will not be those shown here):

### Windows example
#ARCHIVE_DRIVE = "D:\\Archive"
#PROJECT_DRIVE = "C:\\NotBackedUp"
### Linux example
ARCHIVE_DRIVE = "/archive/projects/"
PROJECT_DRIVE = "/scratch/username/"
### Azure example
#ARCHIVE_DRIVE = "/dbfs/mnt/segy/"
#PROJECT_DRIVE = "/dbfs/tmp/segy/"

This example will provide thumbnail pictures of the strain-rate data for the 5-200 Hz band that corresponds to the approximate bandwidth of many borehole seismic geophone and accelerometer tools.

The JSON configuration for the signal processing is in the strainrate2thumbnail.json example:

{
"command_list" :
[
 { "name" : "butter",         "uid" :  1, "in_uid" : 0, "type" : "lowpass", "freq" : 200 },
 { "name" : "butter",         "uid" :  2, "in_uid" : 1, "type" : "highpass", "freq" : 5 },
 { "name" : "thumbnail",      "uid" :  3, "in_uid" : 2, "directory_out" : "png" }
]
}

So we will first apply a low-pass Butterworth filter at 200 Hz, then a high-pass at 5 Hz to create band-limited strain-rate. The final step is to write out a thumbnail plot of the data to a results/png/ directory.

The "config" attribute in the JSON configuration for the strainrate to attributes step will need to point to this file. See the segy2witsmlConfig.json for an example.

This workflow does not require any post-processing, so in CASE00.py comment out steps 3 and 4 as shown:

    #STEP 3: ingest WITSML FBE
    #distpy.ingesters.parallel_ingest_witsml.main(configWITSMLIngest)

    #STEP 4: Generate default plots - available from version 1.1.0
    #distpy.controllers.parallel_plots.main(configPlots)

Finally, make sure that the ingestion JSON configuration file has the correct in and out directories (see sgyConfig.json for an example). More information on this file is given in the Cloud tutorial.