Dev Notes : GPU - Schlumberger/distpy GitHub Wiki

GPU using cupy

The strategy for supporting GPU in distpy is to rely on the numpy compatible cupy package.

Currently only a subset of commands in the signal-processing directed graphs are available for GPU. The to_gpu command is used to turn the input stream into data on the GPU, and the from_gpu is used to take results from the GPU back into numpy arrays.

When data is on the GPU the subsequent commands will attempt to execute on the GPU. When no GPU exists, the to_gpu and from_gpu commands have no effect. This allows flows designed for a GPU environment to be tested on a CPU without modification.

Currently it is left to the creator of the command JSON to make sure that only GPU-enabled commands are sent GPU data. Your script will fail if you attempt to run a non-GPU-enabled command on a GPU.

End-User Perspective

As an example, here is the strainrate2noiselog.json example which has GPU enabled:


{
"document" : 0,
"name" : "Standard 3-band Noise Log, GPU-enabled",
"description" : "Creates total energy outputs for the 200*, 600* and 1000* bands of a conventional Noise Log. If you have GPU it is used.",
"command_list" :
[
 { "name" : "to_gpu",         "uid" :  1, "in_uid" :  0 },
 { "name" : "fft",            "uid" :  2, "in_uid" :  1 },
 { "name" : "rms_from_fft",   "uid" :  3, "in_uid" :  2, "low_freq" : 0, "high_freq" : -1 },
 { "name" : "multiple_calcs", "uid" :  4, "in_uid" :  2,   "func" : "te_from_fft",     "low_freq" : [200,600,1000], "high_freq" : [-1,-1,-1] },
 { "name" : "from_gpu",       "uid" :  5, "in_uid" :  3 },
 { "name" : "from_gpu",       "uid" :  6, "in_uid" :  4 }, 
 { "name" : "write_witsml",   "uid" :  7, "in_uid" :  6, "directory_out" : "NoiseLog", "low_freq" : [200,600,1000], "high_freq" : [-1,-1,-1],  "gather_uids" : [5], "data_style" : "NoiseLog" }
]
}

The initial data read (from a *.npy into memory) is a CPU action, this is command 0 and is always the head-node of the directed graph.

to_gpu moves that data onto the GPU, if you are running on a CUDA device and have installed cupy. If cupy is not found then this becomes a null command.

With the data on the GPU, the next 3 commands all execute on the GPU.

fft is applying a Fourier Transform. In your hardware configuration you will have specified a BOX_SIZE and this is taken into account here, so if the GPU has limited memory and you have out-of-memory issues, you can try reducing BOX_SIZE. The choice of BOX_SIZE can have a big impact on performance.

rms_from_fft is calculating the RMS noise from the Fourier Transformed data.

multiple_calcs is calculating several examples of total noise energy in the described bands using 'te_from_fft'

The first from_gpu command is taking the results from command 3 (the rms_from_fft) and bringing it to the CPU as a numpy array. The second from_gpu is doing the same for the results from command 4 (the multiple_calcs).

The write_witsml command is not available on GPU and is use to write out results to the WITSML FBE format. This command takes the two results, which are now on the CPU, and performs the write operation.

Developer perspective

From a development perspective, the GPU support is provided by agnostic.py. Since cupy can only be installed onto a CUDA installation, we need to cater for situations where cupy is not available.

The following code enables this behaviour:

try:
    import cupy as cp
except:
    import distpy.calc.cupy_facade as cp

The cupy_facade.py implements only the cupy get_array_module() function to return numpy and has null implementations for the asarray() and asnumpy() operations.

Extending the pub_command_set

with a GPU compatible command The steps for extending the command set are:

Add a function to extra_numpy.py to do the new calculation.
Add a class inheriting from BasicCommand to pub_command_set.py

extra_numpy.py agnostic functions

Refer to the cupy comparison table to determine whether your algorithm can offer GPU support.

If this is a straight-forward port of a numpy function, the naming convention in distpy is to call it agnostic_{original_name}. For new algorithms the convention is to give them unique names that reflect the algorithm's purpose.

A global instantiation GPU_CPU = agnostic.agnostic() sets up the system so that GPU_CPU can be used as one would use cupy to make agnostic functions.

The following example shows how extra_numpy.py implements a CPU/GPU agnostic version of abs():

def agnostic_abs(x):
    xp = GPU_CPU.get_numpy(x)
    return xp.abs(x)

The command in pub_command_set.py should implement isGPU() to return True. Currently this just affects the documentation, but in the future this could also be used to validate JSON directed-graph schemes.

class AbsCommand(BasicCommand):
    def __init__(self,command, jsonArgs):
        super().__init__(command, jsonArgs)

    def docs(self):
        docs={}
        docs['one_liner']="Take the absolute value of the input"
        return docs

    def isGPU(self):
        return True

    def execute(self):
        self._result = extra_numpy.agnostic_abs(self._previous.result())

The final step is to add the command into the KnownCommands()

def KnownCommands(knownList):
    knownList['NONE']           = BasicCommand
    knownList['data']           = DataLoadCommand
    knownList['abs']            = AbsCommand
    knownList['argmax']         = ArgmaxCommand
    ...
    knownList['write_witsml']   = WriteWITSMLCommand
    return knownList