Dev Notes : GPU - Schlumberger/distpy GitHub Wiki
GPU using cupy
The strategy for supporting GPU in distpy is to rely on the numpy compatible cupy package.
Currently only a subset of commands in the signal-processing directed graphs are available for GPU. The to_gpu
command is used to turn the input stream into data on the GPU, and the from_gpu
is used to take results from the GPU back into numpy arrays.
When data is on the GPU the subsequent commands will attempt to execute on the GPU. When no GPU exists, the to_gpu
and from_gpu
commands have no effect. This allows flows designed for a GPU environment to be tested on a CPU without modification.
Currently it is left to the creator of the command JSON to make sure that only GPU-enabled commands are sent GPU data. Your script will fail if you attempt to run a non-GPU-enabled command on a GPU.
End-User Perspective
As an example, here is the strainrate2noiselog.json
example which has GPU enabled:
{
"document" : 0,
"name" : "Standard 3-band Noise Log, GPU-enabled",
"description" : "Creates total energy outputs for the 200*, 600* and 1000* bands of a conventional Noise Log. If you have GPU it is used.",
"command_list" :
[
{ "name" : "to_gpu", "uid" : 1, "in_uid" : 0 },
{ "name" : "fft", "uid" : 2, "in_uid" : 1 },
{ "name" : "rms_from_fft", "uid" : 3, "in_uid" : 2, "low_freq" : 0, "high_freq" : -1 },
{ "name" : "multiple_calcs", "uid" : 4, "in_uid" : 2, "func" : "te_from_fft", "low_freq" : [200,600,1000], "high_freq" : [-1,-1,-1] },
{ "name" : "from_gpu", "uid" : 5, "in_uid" : 3 },
{ "name" : "from_gpu", "uid" : 6, "in_uid" : 4 },
{ "name" : "write_witsml", "uid" : 7, "in_uid" : 6, "directory_out" : "NoiseLog", "low_freq" : [200,600,1000], "high_freq" : [-1,-1,-1], "gather_uids" : [5], "data_style" : "NoiseLog" }
]
}
The initial data read (from a *.npy into memory) is a CPU action, this is command 0 and is always the head-node of the directed graph.
to_gpu
moves that data onto the GPU, if you are running on a CUDA device and have installed cupy. If cupy is not found then this becomes a null command.
With the data on the GPU, the next 3 commands all execute on the GPU.
fft
is applying a Fourier Transform. In your hardware configuration you will have specified a BOX_SIZE
and this is taken into account here, so if the GPU has limited memory and you have out-of-memory issues, you can try reducing BOX_SIZE
. The choice of BOX_SIZE
can have a big impact on performance.
rms_from_fft
is calculating the RMS noise from the Fourier Transformed data.
multiple_calcs
is calculating several examples of total noise energy in the described bands using 'te_from_fft'
The first from_gpu
command is taking the results from command 3 (the rms_from_fft
) and bringing it to the CPU as a numpy array. The second from_gpu
is doing the same for the results from command 4 (the multiple_calcs
).
The write_witsml
command is not available on GPU and is use to write out results to the WITSML FBE format. This command takes the two results, which are now on the CPU, and performs the write operation.
Developer perspective
From a development perspective, the GPU support is provided by agnostic.py
. Since cupy can only be installed onto a CUDA installation, we need to cater for situations where cupy is not available.
The following code enables this behaviour:
try:
import cupy as cp
except:
import distpy.calc.cupy_facade as cp
The cupy_facade.py
implements only the cupy get_array_module()
function to return numpy and has null implementations for the asarray()
and asnumpy()
operations.
pub_command_set
Extending thewith a GPU compatible command The steps for extending the command set are:
- Add a function to
extra_numpy.py
to do the new calculation. - Add a class inheriting from
BasicCommand
topub_command_set.py
extra_numpy.py
agnostic functions
Refer to the cupy comparison table to determine whether your algorithm can offer GPU support.
If this is a straight-forward port of a numpy function, the naming convention in distpy is to call it agnostic_{original_name}
. For new algorithms the convention is to give them unique names that reflect the algorithm's purpose.
A global instantiation GPU_CPU = agnostic.agnostic()
sets up the system so that GPU_CPU
can be used as one would use cupy to make agnostic functions.
The following example shows how extra_numpy.py
implements a CPU/GPU agnostic version of abs()
:
def agnostic_abs(x):
xp = GPU_CPU.get_numpy(x)
return xp.abs(x)
The command in pub_command_set.py
should implement isGPU()
to return True
. Currently this just affects the documentation, but in the future this could also be used to validate JSON directed-graph schemes.
class AbsCommand(BasicCommand):
def __init__(self,command, jsonArgs):
super().__init__(command, jsonArgs)
def docs(self):
docs={}
docs['one_liner']="Take the absolute value of the input"
return docs
def isGPU(self):
return True
def execute(self):
self._result = extra_numpy.agnostic_abs(self._previous.result())
The final step is to add the command into the KnownCommands()
def KnownCommands(knownList):
knownList['NONE'] = BasicCommand
knownList['data'] = DataLoadCommand
knownList['abs'] = AbsCommand
knownList['argmax'] = ArgmaxCommand
...
knownList['write_witsml'] = WriteWITSMLCommand
return knownList